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Query sequence being compared:EllIS-012-FIG2AB.PEP (1-256) 

Number of sequences searched: 30847 

Number of scores above cutoff: 4007 

Results of the initial comparison of ELLIS-0 12-FIG2AB. PEP (1-256) with: 
Data bank : A-GeneSeq 11. all entries 
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STDEV -1 0 1 2 
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PARAMETERS 


Similarity matrix 

Unitary 

K-tuple 

2 

Mismatch penalty 

5 

Joining penalty 

30 

Gap penalty 

1.00 

Window size 

32 

Gap size penalty 

0.26 



Cutoff score 

0 



Randomization group 

0 



Initial scores to save 40 

Alignments to save 

15 

Optimized scores to 

save 0 

Display context 

50 


SEARCH STATISTICS 


Scores: 

Mean 

Median Standard 

Deviation 


3 

4 1.25 


Times: 

CPU 

Total Elapsed 


00:01:03.08 

00:02:09, 

.00 

Number of residues: 


4048030 


Number of sequences 

searched: 

30847 


Number of scores above cutoff: 

4007 



Cut-off raised to 2. 

Cut-off raised to 3. 

Cut-off raised to 4. 

Cut-off raised to 5. 

Cut-off raised to 6 . 

The scores belou are sorted by initial score. 
Significance is calculated based on initial score. 


The list of best scores is: 


Sequence Name Description 


Init. Opt. 

Length Score Score Sig. Frame 


* 54 * 5 standard deviations above mean **** 


1 . 

R04747 ~ 

Amino acid sequence of modifi 

231 

10 

23 

5.60 

0 

2. 

R04751 

Amino acid sequence of maize 

235 

10 

22 

5.60 

0 

3. 

R04749 

Amino acid sequence of maize 

235 

10 

22 

5.60 

0 

4. 

R04748 

Amino acid sequence of maize 
**** 4 standard deviations i 

235 

above mean 

10 

**«* 

23 

5.60 

0 

5. 

R2S289 

HI-30 N-terminal sequence. 

20 

9 

9 

4.80 

0 

6. 

P91700 

Protein increasing pulmonary 

23 

9 

9 

4.80 

0 

7. 

P91701 

Protein increasing pulmonary 

35 

9 

9 

4.80 

0 

8. 

R30953 

Rabbit uhey acidic protein. 

127 

9 

16 

4.80 

0 

9. 

P81110 

Sequence of neu fusion protei 

352 

9 

34 

4.80 

0 

10. 

R31046 

Rat DIB dopamine receptor. 

475 

9 

18 

4.80 

0 

11. 

R21082 

Dopamine D1 receptor encoded 

477 

9 

19 

4.80 

0 

12. 

R22546 

Truncated Dopamine B1 recepto 479 

**»* 3 standard deviations above mean 

9 

**** 

20 

4.80 

0 

13. 

R31224 

Transmembrane region of HIV-1 

28 

8 

9 

4.00 

0 

14. 

R27470 

HIV-1 { 1 1 IB) env transmembran 

28 

8 

9 

4.00 

0 

15. 

R15248 

Carbohydrate binding domain # 

32 

8 

9 

4.00 

0 

16. 

R22089 

Human MK protein. 

143 

8 

15 

4.00 

0 

17. 

P80745 

Sequence of AAs 600-750 of HI 

150 

8 

13 

4.00 

0 

18. 

R24301 

Glycopeptide resistance prote 

161 

8 

15 

4.00 

0 

19. 

P20007 

Hybrid human leukocyte interf 

187 

a 

15 

4.00 

0 

20. 

P20103 

Sequence encoded by leukocyte 

188 

8 

15 

4.00 

0 

21. 

R20564 

0-glycosylated IFN-alpha2c. 

188 

8 

15 

4.00 

0 

22. 

R20549 

Human IFNalpha 2C from pAD19B 

138 

8 

15 

4.00 

0 

23. 

R 1 1 802 

Sporamin A encoded by the cDN 

219 

8 

17 

4.00 

0 

24. 

R11356 

Alkaline phosphatase-IFN alph 

219 

8 

17 

4.00 

0 

25. 

P95375 

Sequence of lipase of Bacillu 

247 

8 

30 

4.00 

0 

26. 

P70831 

Sequence of lipase of Bacillu 

247 

8 

31 

4.00 

0 

27. 

R06495 

Beta 3 adrenergic receptor. 

402 

8 

36 

4.00 

0 

28. 

R12395 

Transcription activator. 

406 

8 

16 

4.00 

0 

29. 

R05539 

Rat D2 dopamine receptor. 

415 

8 

14 

4.00 

0 

30. 

R30886 

ETb receptor. 

442 

8 

29 

4.00 

0 

31. 

R10544 

D2 dopamine receptor long iso 

444 

8 

15 

4.00 

0 

32. 

R22499 

IGARSYGI-CPlasminogen 347-541 

467 

8 

35 

4.00 

0 

33. 

R22032 

Truncated human urinary throm 

475 

8 

35 

4.00 

0 

34. 

R22503 

CGARSYQI-IPlasminogen 347-541 

476 

8 

35 

4.00 

0 

35. 

R22013 

Truncated human thrombomodul i 

480 

8 

35 

4.00 

0 

36. 

R13877 

Thrombin-binding substances ( 

486 

8 

35 

4.00 

0 

37. 

R24400 

Recombinant thrombin-binding 

494 

8 

35 

4.00 

0 

38. 

R10617 

Soluble thrombomodulin deriv. 

515 

8 

35 

4.00 

0 

39. 

R22018 

Human thrombomodulin (1-516) 

516 

8 

35 

4.00 

0 

40. 

R22017 

Human thrombomodulin (1-516) 

516 

8 

35 

4.00 

0 


1. ELLIS-012-F1G2AB.PEP (1-256) 

R04747 Amino acid sequence of modified 19 kD maize zein e 

ID R04747 standard; protein; 231 AA. 

AC R04747; 

DT 05-AUG-1990 (first entry) 

DE Amino acid sequence of modified 19 kD maize zein encoded by clone CZ19A2 
KW Maize zein; lysine substitution. 

OS Maize. 

PM US4885357-A. 

PD 05-DEC-1989. 

PF 21 -APR-1988 ; 184348. 

PR 21-APR-19S8; US-184348. 

Pi (I liRPl I ntwi.nl (DIIDHl 


PI Larkins Bi Cuellar RE> Wallace JC; 

DR WPII 90-050879/07. 

DR N-PSDB; Q03295. 

PT Neu modified zein contg. lysine residues - 

PT uith better nutritional balance; prepd. by expressing nutated 

PT zein gene 

PS Disclosure; Fig 4; 18pp; English. 

CC The patent concerns a modified 19 or 22 kD zein uhich includes Lys in the 
CC internal repeated region of the zein. This is the amino acid sequence of 
CC a modified 19 kD zein. It has better nutritional balance than unmodified 
CC zein (uhich lacks Lys); but retains the other properties zein - ability 

CC to form protein bodies uithin the rough endoplasmic reticulum of the 

CC host cell; and solubility in alcohol. 

SQ Sequence 231 AA; 

SQ 32 A; 2 r; 9 N; 2D; OB; 3 C; 40 Q; 1 E; 0 z; 6 G; 3 H; 

SQ 10 I; 45 L; 0 K; 1 n; 15 F; 24 P; 15 s; 8 T; 0 w; 8 Y; 7 V; 

Initial Score = 10 Optimized Score = 23 Significance = 5.60 

Residue Identity = 217. Watches = 33 Mismatches = 102 

Gaps = 16 Conservative Substitutions = 0 

70 80 90 100 110 X 120 130 

CRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDSNGTGVCRP 

I II II 
IFCFLHLLG-LSASAATATIFP 
X 10 20 

140 150 160 170 180 190 200 

WTNCSLDGRSVLKTGTTEKDV — VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSAL LLA- 

II I I II I I I I I II I III 

— QCSQTPIASLLPPYLSPAVSSVCENP — IL3PYRIQQAIAAGILPLSPLFL9SPSALLQQLPLVHLLAQ 
30 40 50 60 70 80 

210 220 230 240 250 X 

LIFITLL-FSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 
I I II I II II I I 
NIRASQLQ3LVLGNLAAYSQHSQFLPF— NQLAALNSAAYL3QQ3QLPFSQLAAAYPQQFLPFNGLAALNSA 
90 100 110 120 130 140 X 150 

AYLQQQSPLPFSQLADVSPATFLTQPQLLPFYLHA 
160 170 ISO 190 


2. ELLIS-01 2-F IG2AB . PEP (1-256) 

R04751 Amino acid sequence of maize zein encoded by clone 

ID R04751 standard; protein! 235 AA. 

AC R04751; 

DT 05-AUG-199Q (first entry) 

DE Amino acid sequence of maize zein encoded by clone cZ 19C1 
KW Maize zein! lysine substitution! clone cZ19Cl. 

OS Maize. 

PN US4885357-A. 

PD 05-DEC-1989. 

PF 21-APR-1988; 184348. 

PR 21 -APR- 1988; US-184348. 

PA (LUBR! Lubrizol Corp (PURD) . 

PI Larkins B; Cuellar RE; Wallace JC; 

DR WPI; 90-050879/07. 

DR N-PSDB! 304373. 

PT Neu modified zein contg. lysine residues - 

PT with better nutritional balance; prepd. by expressing mutated 

PT zein gene 

PS Disclosure; Fig 4; 18pp; English. 

CC The patent concerns a modified 19 or 22 kD zein uhich includes Lys in the 

CC internal reseated renion of the 7oirt. This ie the amine arid aenuenre of 


CC a modified 19 kD zein. It has better nutritional balance than unmodified 
CC zein (which lacks Lys), but retains the other properties zein - ability 

CC to form protein bodies within the rough endoplasmic reticulum of the 

CC host cell, and solubility in alcohol. 

S3 Sequence 235 AA; 

SQ 37 A; 3 R; 9 N; ID; OB; 3 C; 39 3; IE; 0 Z; 5 G; 3 H; 

SQ li I ; 48 l; l k; 3 M; 14 F; 21 p; 17 s; 5 T; 0 w; 8 Y; 6 v; 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optimized Score = 22 Significance = 5.60 
217. (latches = 32 Mismatches = 98 
19 Conservative Substitutions = 0 


60 70 80 90 100 110 120 130 

NCN1CRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 


MAAKIFCLIHLLG-LSASAATA 
X 10 20 


140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVL--KTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

i ii i ii i i i i iii mi i 

SIFP--QCSQAPIASLLPPYLSPAMSSVCENP — ILLPYRIQQAIAAG ILPLSPLFLS3SSALLQQL 

30 40 50 60 70 80 


210 220 230 240 250 X 

IFITLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDAC5CRCPQEEEGGGGGYEL 

II II I II I I 

PLVHLL — AQMIR AQ9LQSLVLANLAAYSQ9QQFLPFNQLAALNSAAYLQ9Q3LLPFSQLAAAYPRQ 

90 100 110 120 130 X 140 

FLPFN0LAALNSHAYVQ8QQLLPFSQLAAVSPA 
150 160 170 180 


3. ELLIS-012-FIG2AB.PEP (1-256) 

R04749 Amino acid sequence of maize zein encoded by clone 

ID R04749 standard; protein; 235 AA. 

AC R04749; 

DT 05-AUG-1990 (first entry) 

DE Amino acid sequence of maize zein encoded by clone cZ 19AB1 

KH Maize zeir,; lysine substitution. 

OS Maize. 

PN US4885357-A. 

PD 05-DEC-1989. 

PF 2 1 -APR- 1 988 ; 184348. 

PR 21-APR-1988; US-134348. 

PA (LUBR) Lubrizol Corp (PURD). 

PI Larkins B, Cuellar RE, Wallace JC; 

DR WPI; 90-050879/07. 

PT New modified zein contg. lysine residues - 

PT with better nutritional balance, prepd. by expressing mutated 

PT zein gene 

PS Disclosure; Fig 4; ISpp; English. 

CC The patent concerns a modified 19 or 22 kD zein which includes Lys in the 

CC internal repeated region of the zein. This is the amino acid sequence of 

CC a modified 19 kD zein. It has better nutritional balance than unmodified 
CC zein (which lacks Lys), but retains the other properties zein - ability 

CC to form protein bodies within the rough endoplasmic reticulum of the 

CC host cell, and solubility in alcohol. 

SQ Sequence 235 AA; 

SQ 37 A; 3 R; 9 M; ID; 0B; 3 c; 39 9; IE; o z; 5 G; 3 H; 

SQ ll I; 48 L; l k; 3 H; 14 F; 21 P; 17 S; 5 T; 0 W; 8 Y; 6 V; 


Initial Score 


10 Optimized Score 


22 Significance = 5.60 


Gaps 


19 Conservative Substitutions 


0 


60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGBELTKQGCKTCSLGTFMDflNGTG 


MAAKIFCLIMLLG-LSASAATA 
X 10 20 


140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVL-- KTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSL8VLTLFLALTSALLLAL 

I I! I II I I I I III llll I 

SIFP— QCSQAP1ASLLPPYLSPAMSSVCENP — ILLPYRIQQAI AAG I LPLSPLFLQQSSALL8QL 

30 40 50 60 70 80 


210 220 230 240 250 X 

IFITLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPGEEEGGGGGYEL 

II II I II II 

PLVHLL — AQNIR AQQLGQLVLANLAAYSQQQQFLPFNGLAALNSAAYLQQQQLLPFSQLAAAYPRQ 

90 100 110 120 130 X 140 

FLPFNQLAALNSHAYVQQQQLLPFSQLAAVSPA 
150 160 170 180 


4. ELLIS-012-FIG2AB.PEP (1-256) 

R04748 Anino acid sequence of maize zein encoded by clone 


ID R04748 standard: protein: 235 AA. 

AC R04748; 

DT 05-AUG-1990 (first entry) 

DE Anino acid sequence of nai 2 e zein encoded by clone cZ19Bl 
KW Maize zein: lysine substitution. 

OS Maize. 

PM US4885357-A. 

PD 05-DEC- 1989. 

PF 21-APR-1988; 184348. 

PR 2 1 -APR-1988 ; US-184348. 

PA (LUBR) Lubrizol Corp (PURD). 

PI Larkins B. Cuellar REr Wallace JC : 

DR WPI: 90-050879/07. 

DR N-PSDB; Q03296. 

PT Neu nodified zein contg. lysine residues - 

PI with better nutritional balancer prepd. by expressing nutated 

PT zein gene 

PS Disclosure; Fig 4; 18pp; English. 

CC The patent concerns a nodified 19 or 22 kD zein which includes Lys in the 
CC internal repeated region of the zein. This is the anino acid sequence of 
CC a nodified 19 kD zein. It has better nutritional balance than unmodified 
CC zein (which lacks Lys) i but retains the other properties zein - ability 

CC to form protein bodies within the rough endoplasnic reticulum of the 

CC host cel l i and solubility in alcohol. 

SQ Sequence 235 AA; 

SO 27 A; 2 R; 10 N> 0 D; 0B; 3 C; 42 a; IE; 0 Z; 8 G; 2 H; 

SQ 10 I ; 45 L; l k; 2 M; 15 F; 23 P; 19 s; 9 T; o w; 8 Y; 8 v; 


Initial Score 
Residue Identity 
Gaps 


10 Optimized Score = 23 Significance = 5.60 

227. Matches = 33 Mismatches = 97 

19 Conservative Substitutions = 0 


60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECEC1EGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 


MAAKIFCFLMLLG-LSASAATA 
X 10 20 


VCRPUTNCSLDGRSVLKTGTTEKDV — VCGPPVVSFSPSTTZSVTPEGGPGGHSLfiVLTLFLALTSALLLAL 

i ii i i ii i i i i 111 mi i 

TIFP — QCSQTPITSLLPPYLSSAVSBVCENP — ILQPYRIQQAIAAG I LPLSPLFLQQSSALLQQL 

30 40 50 60 70 80 


210 220 230 240 250 X 

I F I TLLFS VLKUI RKKFPH I FK8PFKKTTGA AQEEDACSCRCPaEEEGGGGGYEL 

II II I II II 

PLVHLL — AQNIR AO9LOQLVLGNLAAYSQ0QQFLPFNSLGSLNSSAYLSQQQ0LPFSSLPAAYPQQ 

90 100 110 120 130 X 140 

FLPFNQLAALNSPAYLQQQQLLPFSGLAGVSPA 
150 160 170 180 


5. ELLIS-012-FIG2AB.PEP (1-256) 

R28289 HI-30 N-terninal sequence. 


ID R28289 standard; peptide! 20 AA. 

AC R28289; 

DT 1 9-MAR-i 993 (first entry) 

DE HI-30 N-terminel sequence. 

KW White blood cell adhesion; ubc! endothelial cells! treatment; 

KH sepsis; inflammation; arthritis. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Region 12 

FT /note= “uncertain residue 9 * 11 
FT Region 17 

FT /note= “uncertain residue" 

FT Region 18 

FT /note= “uncertain residue 0 
FT Region 19 

FT /note= “uncertain residue" 

PN W09218160-A. 

PD 29-0CT-1992. 

PF 16-APR-1992; U03132. 

PR 17-APR— 1991 ; US-637300. 

PR 14— MAY— 1991 ! US-700526. 

PA (CETU ) CETUS CORP. 

PA (CETU ) CETUS ONCOLOGY CORP. 

PI Houston LL, Kaymakcalan Z, Liu DY. 

DR DPI! 92-381785/46. 

PT Use of alpha 1 micro-globulin. HI-30 or inter-alpha-trypsin inhibitor 
PT light chain - to inhibit adhesion of white blood cells to endothelial 
PT cells, for treating sepsis, inflammation and arthritis 
PS Disclosure! Page 18; 41pp; English. 

CC The sequence is that of the N-terminal sequence of human HI-30 
CC which can be used therapeutically or prophylatically to reduce. 

CC prevent or alter the adhesion of white blood cells to endothelial 

CC cells, pref. to reduce adhesion between leukocytes to endothelial 

CC cells that line blood cell walls. It can be used to treat disease 
CC states, e.g. sepsis, inflammation, arthritis, atherosclerosis. 

CC autoimmune disease, rheumatoid arthritis, acute and chronic 
CC inflammation, acute respiratory distress syndrome, ischemia/reperfusion 
CC injury, inflammatory bowel disease, haemolytic transfusion reaction. 

CC certain cancers, transplantation or trauma (e.g burns). 

CC See also R28288-R28292. 

SQ Sequence 20 AA; 

SG 1 A; 0 R; 0 N; l D; 0 B; 0 C; 2 Q; 4 E; 0 Z; 4 G; 0 H; 

SQ 0 I; 2 L; 2 K; 0 n; 0 F; l P; 0 S; 1 T; 0 W; 0 Y; 2 V; 


Initial Score 
Residue Identity 
Gaps 


9 Optimized Score = 9 Significance = 4.80 

567. Hatches = 9 Hismatches = 7 

0 Conservative Substitutions ' = 0 



200 210 220 230 240 250 X 

LALTSALLLALIFITLLFSVLKUIRKKFPHIFKaPFKKTTGAAGEEDACSCRCPfiEEEGGdGGYEL 


AVLPQEEEGGGGQLVTKKED 
X 10 X 20 


6. ELLIS-012-FIG2AB.PEP (1-256) 

P91700 Protein increasing pulmonary surfactant activity. 

ID P91700 standard! protein; 23 AA. 

AC P91700; 

DT 13-JUN-1990 (first entry) 

DE Protein increasing pulmonary surfactant activity. 

KW Pulmonary surfactant; respiratory disorders; 

OS Homo sapiens. 

FH Key Location/Gualif iers 

FT Misc-difference 14 

FT /label=ile. gly or val pref. ile 

FT Misc-difference 16 

FT /label=ile. gly or val. pref. gly 
PN WQ8900167-A. 

PD 12-JAN-1989. 

PF 29-JUN-1988; 00361. 

PR 01-JUL-1987; SE-027249. 

PR 22-SEP-1987; SE-036612. 

PA (KABI) Kabigen Ab. 

PI Curstedt Ti Robertsson B. Jornvall H; 

DR WPI; 89-039631/05. 

PT Proteins with pulmonary surfactant activity - 
PT obtd. from pig lung and human broncho-alveolar lavage or 
PT amniotic fluidi for treating respiratory disorders. 

PS Claim 1» Page 16; 24pp; English. 

CC Proteins, derived from bronchoalveolar lavage and amniotic fluid, can be 

CC extracted and shown to have pulmonary surfactant activity. Useful in 

CC treating respiratory disorders, reducing surface tension at air-liquid 
CC interface. 

SQ Sequence 23 AA; 


SQ 1 

a; o r; 

0 n; 0 

d; ob; o c; o g; 

0 e; 

o z; 2 g; oh; 

sa l 

I; 7 l; 

0 K? 1 

m; of; o P; o s; 

0 t; 

o w; o y; 9 v; 

SG 2 
Initial 

Others; 

Score 

= 9 

Optimized Score = 

9 

Significance = 4.80 

Residue 

Identity 

= 397. 

Matches = 

9 

Mismatches = 14 

Gaps 


= 0 

Conservative Substitutions 

= 0 


X 10 20 30 40 50 60 70 

HGNNCYNVVVIVLLLVGCEKVGAVANSCDNCaPGTFCRKYNPVCKSCPPSTFSSIGGSPNCNICRVCAGYFR 


LLVVVVVVLL VVVX I XGALLHGL 
X 10 20 X 


FKKFCS 


7. ELLIS-012-FIG2AB.PEP (1-256) 

P91701 Protein increasing pulmonary surfactant activity. 

ID P91701 standard; protein; 35 AA. 

AC P91701; 

DT 13-JUN-1990 (first entry) 

DE Protein increasing pulmonary surfactant activity. 

KW Pulmonary surfactant; respiratory disorders; 

OS Sus scrofa. 


FT Hisc-difference 1 

FT /label=leu or phe 

FT Misc-difference 9 

FT /label=asn or his 

FT Misc-difference 26 

FT /label=ile. gly or val. pref. gly 
FT Misc-difference 28 

FT /label=ile. gly or val. pref. gly 
PN W089OO167-A. 

PD 12-UAN-1989. 

PF 29-JUN-1988; 00361. 

PR 01-JUL-1987; SE-027249. 

PR 22-SEP-1987; SE-036612. 

PA (KABI) Kabigen Ab. 

PI Curstedt T. Robertsson B, Jornvall H; 

DR WPI; 89-039631/05. 

PT Proteins with pulmonary surfactant activity - 
PT obtd. from pig lung and human broncho-alveolar lavage or 
PT amniotic fluid. for treating respiratory disorders. 

PS Claim 2; Page 16; 24pp; English. 

CC Proteins. derived from pig lung, can be extracted and shown to have 
CC pulnonary surfactant activity. Useful in treating respiratory disorders. 
CC reducing surface tension at the air-liquid interface. 

SO Sequence 35 AA; 

so l A; 2 R; on; 0 d; o B; 2 C; o q; 0 e; o z; 2 g; oh; 

S3 2 3 ; 8 l; l k; l m; of; 2 p; os; 0 T; o w; 0 y: 10 vs 

SO 4 Others; 

Initial Score = 9 Optimized Score = 9 Significance = 4.80 

Residue Identity = 32'/. Matches = 9 Mismatches = 19 

Gaps = 0 Conservative Substitutions = 0 

X 10 20 30 40 50 60 

MGNNCYNVVVIVLLLVGCEKVGAVBNSCDNCflPGTFCRKYNPVCKSCPPSTFSSIGGQPNCNICR 

III I III II 

XRIPCCPVXLKRLLVVVVVVLLVVVXIXGALLHGL 
10 20 30 X 


70 

VCAGYFRFKKFCS 


8. ELLIS-012-F1G2AB.PEP (1-256) 

R30953 Rabbit whey acidic protein. 

ID R30953 standard; Protein! 127 AA. 

AC R30953; 

DT 07-MAY-1993 (first entry) 

DE Rabbit whey acidic protein. 

KW WAP; promoter! heterologous protein production. 

OS Oryctolagus cuniculus. 

PN W09222644-A. 

PD 23-DEC-1992. 

F'F 12-JUN-1992; F00533. 

PR 12-JUN-1991 ; FR-007179. 

PA (I NRG ) IMRA INST NAT RECH AGR0NOMIQUE. 

PI Devinoy E. Houdebine L. Thepot D; 

DR UPI! 93-018131/02. 

DR N-PSDB! 034591. 

PT Heterologous protein prodn. in milk of transgenic mammal - contg. 
PT structural gene under control of promoter of rabbit acidic whey 
PT protein, e.g. for human growth hormone 
PS Disclosure; Fig 5; 38pp; French. 

CC The expression control elements from at least a 3kb fragment from 
CC the 3’-end of the complete rabbit WAP gene are fused to a Sequence 

CC enrnrtinn a hat errilftfifiiie nrntoin. surh as huean hwanne . 


CC erythropoietin, granulocyte colony stimulating factor, 

CC alpha-antitrypsin, hirudin, urokinase and Factor VIII. The rabbit 

CC HAP promoter is far more efficient at expressing such proteins in 

CC primary mammalian epithelial cells (induced by prolactin and 

CC glucocorticoids) than rat or mouse HAP promoters. The preferred 

CC regulatory region is a 6.3kb Hindi I I-DanHI fragment or a 17kb 
CC Hindlll-EcoRI fragment from the region immediately upstream of the 
CC rabbit HAP gene (The sequence of only the first 1821 bases upstream 

CC of the first exon is given in the specification). 

SO Sequence 127 AA; 

so 13 A; 6 R? 2 n; 5 D; o B; 14 c; 4 o; 9 E; o z; 6 G; o h; 

so 6 if 16 l; 4 k; 4 n; 2 F; 12 p; 12 S; 5 t; 1 H; l Y; 5 V; 

Initial Score = 9 Optimized Score = 16 Significance = 4.80 

Residue Identity = 227. Hatches = 18 Hismatches = 58 

Gaps = 3 Conservative Substitutions = 0 

140 150 160 170 180 190 200 

GVCRPHTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTL-FLALTSALLLAL 

I I I III II II 

HRCLISLALGLLALEAALALAP 
X 10 20 

210 220 230 240 250 X 

I F I — TLLFS VLKW I RKKFPH I FKQPFKKTTGAAfiEED ACSCRCPSEEEGGGGG YEL 

II I III I I 

KFIAPVGVMCPEPSSSEETLCLSDNDCLGSTVCCPSAAGGSCRTPIIVPTPKAGRCPWVQAPMLSQLCEELS 
30 40 50 60 70 80 90 

DCANDIECRGDKKCCFSRCAMRYLEPILESTPQ 
100 110 120 


9. ELLIS-012-FIG2AB.PEP (1-256) 

P81110 Sequence of nee fusion protein contg. alpha-l-micr 

ID P81110 standard,’ protein; 352 AA. 

AC P81110J 

DT 06-DEC-1990 (first entry) 

DE Sequence of neu fusion protein contg. alpha-l-microglobulin (AMG) 

DE and the HI-30 region of inter-alpha-trypsin inhibitor (III) light chain 
KW Serine protease; enzyme; pancreatitis! atherosclerosis; 

KH chronic inflammation; therapy; elastase. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT Protein 20.. 202 

FT /label=AHG 

FT Protein 206.. 350 

FT / l abel=HI -30 

FT Domain 226.. 282 

FT / 1 abel= I 

FT Domain 283.. 352 

FT / 1 abel=I I 

FT Hisc-difference 291.. 292 

FT /note=“Differs from the protein sequence of HI-30 
FT purified from urine" 

FT Hisc-difference 343 

FT /note="Differs from the protein sequence of HI-30 
FT purified from urine" 

PN EP-255011-A. 

PD 03-FEB-1988. 

PF 20-JUL-1987! 110461. 

PR 29-JUL-l 986 J US-891469. 

PA (MILE) Miles Laboratories Inc. 

PI Kaumeyer JF, Kotick HP, Polazzi JO; 

no UP!: flfumnuAP/nm 



DR N-PSDB; n81432. 

PI Neu DNA sequence coding for fusion protein contg. alpha-microglobulin - 
PT and inter— alpha-trypsin inhibitor useful for treating excessive 
PT elastase prodn. 

PS Disclosure! p; English. 

CC A fusion protein of the ITI light chain comprising AHG and HI-30 is 
CC claimed. ITI is serine proteasei potentially used for treating excessive 
CC release of hydrolytic enzymes > esp. elastase* in conditions such as 
CC pancreatitisi athersclerosis and chronic inflammation. 


SG Sequence 352 

AA; 







SQ 21 A; IS R; 

13 n; 

12 d; o b; 

16 c; 

13 q; 

28 e; o z; 

36 Gi 4 H; 

SG 15 I; 27 l; 

18 K; 

io n; 14 f; 

19 P; 

20 s; 

26 t; 5 w; 

15 

y; 22 v; 

Initial Score = 

9 

Optimized Score = 

34 

Significance 

= 

4.80 

Residue Identity = 

19'/. 

Hatches 

r 

46 

Mismatches 

= 

168 

Gaps = 

23 

Conservative 

Substitutions 


s 

0 

10 

20 

X 30 

40 

50 60 


70 


MGNNCYNVVVIVLLLVGCEKVGAVQNSCDNC0PGTFCRKYNPVCKSCPPSTFSSIGGQPNCNICR--VCAGY 


MRSLGALLLLLSACLAVSAGPVPTPPDNIGVGENFNISRIYGKHYN 
X 10 20 30 40 

80 90 100 110 120 130 140 

FRFKKFCSSTHNAECECIEGFHCLGPGCTRCEKDCRPGQELTKGGCKTCSLGTFNDQNGTGVCRPMTNCSLD 

I II I I I I I I I I 

LAIGSTCPHLKKIMDRMTVSTLVLGEGATEAE-ISHTSTRMRKGVCEETS-GAYEKTDTDG KFLY 

50 60 70 80 90 100 

150 160 170 180 190 200 210 

GRSVL.KTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLALIFITLLFSVLKMI 

I I I II I I i I I II I II 

HKS-KHNITMESYVVHTTYDEYAI FLTKKFSRHHGPTI TAKLYGRAPQLRET — LL — QDFRVVAQGV — GI 
110 120 130 140 150 160 170 

220 230 240 250 X 

RKKFPHIFKQPFKKTTGAA9EEDACSCR CPQEEEGGGGGYEL 

I I I llllll III 

PEDSIFTMADRGECVPGEQEPEPILIPRVPRAVLPQEEEGSGGGQLVTEVTKKEDSCQLGYSAGPCMGMTSR 
180 190 200 210 220 230 240 

YFYNGTSMACETFQYGGCMGNGNNF 
250 260 270 


10. ELLIS-012-FIG2AB.PEP (1-256) 

R31046 Rat DIB dopamine receptor. 

ID R31046 standard; Protein; 475 AA. 

AC R31046; 

DT 26-MAY-1993 (first entry) 

DE Rat DIB dopamine receptor. 

KM PCR; amplify; degenerate; primer; TM f transmembrane region; human; Dl; 
KM dopamine; receptor; probe; rat; pBLUESCRIPT II SK+; testis; DR5; DIB; 
KM genomic library; lambdaDASH II; Kozak; consensus sequence; V-15. 

OS Rattus rattus. 

PN M09218533-A. 

PD 29-0CT-1992. 

PF 16— APR— 1992; U03187. 

PR 16-APR-l 99 1 ; US-686591. 

PA (UYDU-) UN IV DUKE. 

PI Caron MG « Jarvie KR> Tiberi M; 

DR MPI; 93-036060/04. 

DR N-PSDB; Q35143. 

PT Cloned gene encoding rat Dlb dopamine receptor - used to screen 


PS Disclosure; Page 25-28; 39pp; English. 

CC This sequence represents rat DIB dopamine receptor. The DNA 
CC sequence encoding this polypeptide uas isolated using the primer 
CC sequences given in Q351 46-47 . These oligomers are degenerate primers 
CC corresponding to the 5th and 6th transmembrane (TM) regions of the 
CC human D1 dopamine receptor. These primers uere used to amplify 
CC sheared human DNA and the amplification products uere subcloned into 
CC the sequencing vector pBLUESCRIPT II SK+. A 230bp fragment (V-15) uas 
CC found to correspond to the 5th TH region> the 3rd intracellular loop 
CC and the 6th TN region. V-15 uas used as a template for the synthesis 

CC of a 32P-labeled probe. This probe uas used to screen a rat testis 

CC genomic library in lambdaDASH II. One isolated clone (DR5) had an 

CC open reading frame of 1425 bp (475 amino acids) uhich contained the 

CC full coding sequence for rat DIB-dopamine receptor. The predicted 
CC encoded protein has a molecular ueight of 52834. The putative 
CC initiator methionine uas selected on the basis of the best Kozak 

CC consensus sequence found in frame uith the remainer of the coding 

CC block and preceded by a stop codon. 

S9 Sequence 475 AA; 

SQ 41 A; 26 R; 17 N; 18 D; 0 B; 15 c; 16 9; 27 E; 0 Z; 27 G; 7 h; 

59 34 I; 42 L; 13 K; 12 N; 25 F; 26 P; 40 s; 28 t; 13 w; 11 Y; 37 v; 


Initial Score 
Residue Identity 
Gaps 


9 Optimized Score = 18 Significance = 4.80 

23X Hatches = 28 flismatches = 72 

17 Conservative Substitutions = 0 


100 110 120 130 140 X 150 160 

PQCTRCEKDCRPGQELTKQGCKTCSLGTFND9NGTGVCRPUTNCSLDGRSVLKTG— TTEKDVVCG-PPVVS 


MLPPGRNRTA9PARLGLQRQLA 
X 10 20 

170 130 190 200 210 220 230 

FSPSTTISVTPEGGPGGHSLQVLTLFL ALTSALLLAL I F ITLLFSVL — KWIRKKFPHIFK-6PFKKTTGAA 

I II I III I II II I III II I I I 

QVDAPAGSATPLG PAQVVTAGL-LT — LL — I VWTLLGNVLVCAAI VRSRHLRAKMTNIF I VSLAV 

30 40 50 60 70 80 

240 250 X 

QEEDACSCRCPQEEEGGGGGYEL 


SDLFVALLVMPWKAVAEVAGYWPFGTFCDI HVAFDIMCSTASILNLCI ISVDRYWAISRPFRYERKHT QRVA 
90 100 X 110 120 130 140 150 


L 


11. ELLIS-012-FIG2AB.PEP (1-256) 

R21082 Dopamine D1 receptor encoded by clone GL-30. 

ID R21082 standard; Protein; 477 AA. 

AC R21082; 

DT 20-NAY- 1992 (first entry) 

DE Dopamine D1 receptor encoded by clone GL-30. 

KU G-protein-coupled receptor; Parkinson's Disease; schizophrenia; 
KW tardive dyskinesia; dopamine Dl-beta receptor subtype. 

OS Homo sapiens. 

FH Key Location/Bual if iers 

FT Domain 42.. 66 

FT /label= transmembrane 
FT /note= "1“ 

FT Domain 78.. 101 

FT / 1 sbe 1= transmembrane 
FT /note= "11“ 


FT /label* transmembrane 
FT /note* "IIP 

FT Domain 156.. 172 

FT /label* transmenbrane 
FT /note* "IV 

FT Domain 224.. 246 

FT / 1 abe 1 = transmenbrane 

FT /note* “V" 

FT Domain 294.. 315 

FT /label* transmenbrane 
FT /note* "VI" 

FT Domain 337.. 361 

FT /label* transmembrane 
FT /note* “VI 1“ 

FT Modi f ied_site 7.. 9 

FT /label* glycosylation 
PN WQ9200986-A. 

PD 23— JAM-1992 . 

PF 10-JUL-1991 ; U04858. 

PR 10-JUL-1990; US-551448. 

PA (NEUR-) NEUROGENETIC CORP. 

PI Weinshank RL. Hartig PR; 

DR WPlf 92-056815/07. 

DR N-PSDBI Q21014. 

PT Nucleic acid sequences encoding human dopamine D1 receptor - and 
PT anti-sense oligo-nucleotide(s) r useful in treating and diagnosing 
PT abnormal D1 receptor expression e.g. dementiai etc. 

PS Claim 5; Fig If 90pp; English. 

CC Clone GL-30 was isolated from a human spleen library by screening 
CC with a 1.6kb Xbal-BamHI fragment from the human serotonin receptor 
CC gene. The clone was sequenced and found to have an open reading 

CC frame encoding a 477 amino acid protein of mol. ut. 53kD. A 

CC comparison of the protein sequence to sequences of knoun 
CC neurotransmitter receptors indicated that clone GL-30 is a new 
CC member of the G protein-coupled receptor family of molecules which 
CC span the lipid bilayer seven times. The extracellular loop of GL-30 
CC (between transmenbrane regions IV and V) is the longest 
CC extracellular loop 2 of all the knoun G protein-coupled receptors. 

CC GL-30 has greatest homology with the dopamine D1 receptor, i.e. 

CC overall homology of 62 per cent and homology within the 
CC transmenbrane domains of 83 per cent. 

SO Sequence 477 AAi 

SO 46 A! 20 Rf 24 Nf 20 Df 0 Bf 16 Cf 16 8; 21 Ef 0 Zf 25 Gf 8 Hf 

SO 32 If 40 Lf 10 K; 13 Hi 27 F; 29 P; 38 Sf 23 Tf 15 Hf 13 Yf 41 Vf 


Initial Score = 
Residue Identity = 
Gaps = 


9 Optimized Score = 19 Significance = 4.80 

237. Matches = 27 Mismatches = 76 

13 Conservative Substitutions = 0 


100 110 120 130 140 X 150 160 

P8CTRCEKDCRPG8ELTKQGCKTCSLGTFNDQNGTGVCRPHTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSP 


MLPPG-SNGT AYPGQFALYQQL 
X 10 20 


170 180 190 200 210 220 230 

STTISVTPEGG — PGGHSL0VLTLFLALTSALLLALIFITLLFSVL — KWIRKKFPHIFKQPFKKTTGAAQ 

I I I I I II I I II I! I III II I 1 

AQGNAVGG'SAGAPPLGPS-QVVTACL-LT — LL — I IMTLLGNVLVCAAI VRSRKLRANMTNVFI VSLAVS 
30 40 50 60 70 80 


240 250 X 

EEDACSCRCPQEEEGGGGGYEL 

I II 

DLFVALLVMPUKAVAEVAGYWAFGAFCDVUVAFD I HCST AS I LNLCV I SVDRY1IA ISRPFRYKRKHTfiRHAL 
on t no i « n ion 


i -rn 


i an 


icn 


12. ELL1S-012-F1G2AB.PEP (1-256) 

R22546 Truncated Dopanine D1 receptor encoded by pseudoge 


ID R22546 standard! Protein! 479 AA. 

AC R22546! 

DT 20-MAY-1992 (first entry) 

DE Truncated Dopanine D1 receptor encoded by pseudogene clone GL-39. 
KW G-protein-coupled receptor; Parkinson's Disease! schizophrenia; 

KU tardive dyskinesia; dopanine Dl-beta receptor subtype. 

OS Hobo sapiens. 

FH Key Location/Qualifiers 

FT Modif ied_site 7.. 9 

FT / I abe 1= glycosylation 

FT DoRain 42.. 66 

FT / l abe 1= transnenbrane 

FT /note= "I" 

FT Donain 78.. 101 

FT / l abe 1= transnenbrane 

FT /note= "I I* 

FT Donain 117. .138 

FT /label= transnenbrane 
FT /note= "III- 

FT Donain 158.. 174 

FT /label= transnenbrane 
FT /note= "IV" 

FT Misc_difference 190 

FT /note= "corresponds to nonsense codon - 

FT i.e. protein is truncated" 

FT Donain 226.. 248 

FT /label= transnenbrane 

FT /note= “V" 

FT Donain 296.. 317 

FT / l abe 1= transnenbrane 

FT / note= “VI" 

FT Donain 339.. 362 

FT /label= transnenbrane 

FT /note= "VII" 

FT Hisc_difference 457 

FT /note= "corresponds to nonsense codon" 

PN W09200986-A. 

PD 23-JAN-1992. 

PF lO-vJUL-1991! U04858. 

PR 10-JUL-l 990 ! US-551448. 

PA (NEUR-) NEUROGENETIC CORP. 

PI Weinshank RLi Hartig PR; 

DR DPI; 92-056815/07. 

PT Nucleic acid sequences encoding hunan dopanine D1 receptor - and 
PT anti-sense ol igo-nucleotide(s) r useful in treating and diagnosing 
PT abnornal D1 receptor expression e.g. denentia. etc. 

PS Disclosure; Fig 2; 90pp ; English. 

CC Clone GL-39 encodes a truncated (and therefore inactive) dopanine 
CC D1 receptor having strong honology to the full-length receptor 
CC encoded by GL-30 (see 021082). 

SQ Sequence 479 AA; 

SO 41 A; 21 R; 22 N; 19 D; 0 b; 17 c; 16 Q; 23 E; 0 Z; 23 G; 8 H; 

sq 30 I; 40 l; li k; 16 m; 24 F; 33 P; 41 s; 23 T; 14 y; 12 Y; 43 V; 

SQ 2 Others; 


Initial Score = 
Residue Identity = 
Gaps = 


9 Optimized Score = 20 Significance 

257. Matches = 31 Mismatches 

19 Conservative Substitutions 


4.80 

72 

0 


90 100 110 120 130 140 150 160 

rcuri cpnrTPf'FWi'/’pprnri ruarrurrci rTrMnntirTf'iif'pouTijf'ci ni'nniii u'TrTTCvmmrrpp 



HLPPRS — NGT AYPGQL 

X 10 


170 180 190 200 210 220 

VVSFSPSTT I SVTPEGG — PGGHSLSVLTLFL ALTS ALLL AL I F I TLLFSVL — KU IRKKFPH I FK-QPFKK 

1 I ! I Ml I I II II i III II I I 
ALYQQLASGNAVGGSAGAPPLGPS-OVVTACL-LT--LL — 1 1 UTLLGN VLHS A A I VRTRHLR AKHTN VF I 
20 30 40 50 60 70 80 


230 240 250 X 

TTGAAQEEDACSCRCPQEEEGGGGGYEL 

! I II 

VSLAVSDLFVALLVMPMKAVAEVAGYWPFEAFCDVMVAFDIMCSTASILNLCVSVISVGRYHAISRPFRYER 
90 100 110 120 130 140 150 


KMTQRM 


13. ELLIS-012-F1G2AB.PEP (1-256) 

R31224 Transmembrane region of HIV-1 1 1 1 IB) env. 

ID R31224 standard; peptide; 28 AA. 

AC R31224; 

DT 18— MAY— 1993 (first entry) 

DE Transmembrane region of HIV-1 (IIIB) env. 

KM Human immunodeficiency virus; fusion protein; transnembrane anchor; 

KM env; Tl; T2; TH4.1; epitope. 

OS Synthetic. 

PN W09222641-A. 

PD 23-DEC-1992. 

PF 12-JUN-1992; U05107. 

PR 14-JUN-1991 ; US-715921. 

PR ll-JUN-1992; US-897382. 

PA (VIR0-) VIROGENETICS CORP. 

PI Cox ML Paoletti E> Tartaglia J; 

DR WPI; 93-018128/02. 

PT Modified recombinant virus with inactivated non-essential genetic 
PT functions - comprises e.g. vaccinia or avipox virus* used as HIV 
PT vaccine 

PS Example 32; Page 102; 159pp; English. 

CC Fusion peptides expressed by recombinant poxviruses include the 51 
CC amino acid N-terminal portion of HIV-1 (IIIB) env* residues 1-50 
CC (plus an initiating Met). The signal sequence is followed by the 

CC Tl* T2 and TH4.1 epitopes separated from the signal* each other* and 

CC the anchor sequence where present* by a cleavable linker region up to 

CC 5 amino acids in length. The anchor domain is a 28 amino acid trans- 
CC membrane region of HIV-1 (IIIB) env (sequence shown) . 

CC See also R31218-26. 

SO Sequence 28 AA; 


SQ 1 

A; 3 R! 1 

N; o 

D; 0 B; 0 C; l Q; 

0 

E; 

o z; 4 g; o 

H; 

SQ 3 

is 4 l; o 

k; l 

m; 2 f; o p; l s; 

0 

T; 

0 H; 0 y; 7 

V; 

Initial 

Score = 

8 

Optimized Score = 


9 

Significance = 

4.00 

Residue 

Identity = 

327. 

Matches = 


9 

Mismatches = 

19 

Gaps 


0 

Conservative Substitutions 

= 

0 


X 10 

20 

30 X 40 


50 60 

70 


MGNNCYNVVVIVLLLVGCEKVGAVQNSCDNCQPGTFCRKYNPVCKSCPPSTFSSIGGQPNCNICRVCAGYFR 


LFIMIVGGLVGLRIVFAVLSVVNRVRQG 
X 10 20 X 

80 





14. ELLIS-0 1 2-F IG2AB .PEP (1-256) 

R27470 HIV-1 ( 1 1 IB) env transnenbrane region. 

ID R27470 standard; Protein; 28 AA. 

AC R27470; 

DT 24-FEB-1993 (first entry) 

DE HIV-1 ( 1 1 IB) env transmembrane region. 

KH Tl; T2; TH4.1; epitope; HIV-1; env; transnenbrane anchor donain; 

KH vP1060; vP1061; vCP 1 54 ; vCP148; fusion peptide; signal sequence; 

KH cleavable linker; H6 pronoter; polynerase chain reaction; PCR; 

KH vaccinia virus. 

OS Synthetic. 

FH Key Location/Sualifiers 

FT Binding_site 62 

FT /note= "Transnenbrane anchor region binding site" 

PN W09215672-A. 

PD 17-SEP-1992. 

PF 09-MAR— 1992; U01906. 

PR 07— MAR-1991 ; US-666056. 

PR 1 l-JUN-1991 ; US-713967. 

PR 06-MAR-l 992 ; US-847951. 

PA (VIR0-) VIROGENETICS CORP. 

PI Cox HI, De Taisne C. Francis J. Gettig RR. Johnson GP; 

PI Linbach KJ. Norton EK. Paoletti E. Perkus ME > Pincus SE; 

PI Riviere Mi Tartaglia J. Taylor J. 

DR HP I; 92-331718/40. 

PT Vaccine comprises reconbinanti attenuated pox-virus - use for 
PT vaccinating against viral infections such as rabiesi hepatitis Bi 
PT HIV, HSVi EBVr CMV, nunps etc. 

PS Disclosure; Page 327; 456pp; English. 

CC The sequences given in 935846-51 and R27468-70 were used for the 
CC expression of two fusion peptides containing the Tli T2 and TH4.1 
CC epitopes of HIV- 1 env with and without a transnenbrane anchor donain 
CC from HIV-1 env. Plasmids vP 1 060 > vP1061t vCPl 54 and vCP148 were 
CC generated to express a fusion peptide consisting of the signal 

CC sequences from HIV-1 env coupled to sequences corresponding to the Tl. 

CC T2 and TH4.1 epitopes of HIV-1 env by cleavable linker. vP1060 and 

CC vCP 154 differ from vPl 061 and vCP148 in that the former recombinant 

CC viruses express the fusion protein along with sequences coresponding 
CC to the transnenbrane region of HIV-1 env. The HIV-1 (I I IB) env signal 

CC region and vaccinia virus H6 pronoter are derived by polynerase chain 

CC reaction (PCR). The remainder of the coding regions for construction 
CC without the transnenbrane region were also produced by PCR. For the 
CC version containing the transnenbrane region the 3' end of the 
CC amplification product was alter to accomodate the transnenbrane region. 
CC See also 835501-864. 

SQ Sequence 28 AA; 

SQ l A; 3 R; l N; 0 D; 0 B; 0 c; l Q; o E; 0 z; 4 G; 0 H; 

SQ 3 I; 4 l; 0 K; l M; 2 F; 0 P; l S; 0 T; 0 W; 0 Y; 7 v; 


Initial Score = 
Residue Identity = 
Gaps = 


8 Optimized Score = 9 Significance = 4.00 

327. Matches = 9 Mismatches = 19 

0 Conservative Substitutions = 0 


X 10 20 30 X 40 50 60 70 
MGNNCYNVVVIVLLLVGCEKVGAV8NSCDNCQPGTFCRKYNPVCKSCPPSTFSSIGG8PNCN1CRVCAGYFR 

II III I II I 
LFI MI VGGLVGLRI VFAVLSVVNRVRQG 
X 10 20 X 


60 

FKKFCSSTHNAE 



15. ELLIS-012-FIG2AB.PEP (1-256) 

R15248 Carbohydrate binding domain #5. 

ID R15248 standard; F'rotein; 32 AA. 

AC R15248! 

DT 12-FEB-1992 (first entry) 

DE Carbohydrate binding donain #5. 

KW cellulose; CBD; hemicel lulosic substrate; 

KW Trichoderma reesei; cellulase; terminal A region. 

PN HD9117244-A. 

PD 14-NQV-1991. 

PF 08-MAY-1991; DK0124. 

PR 09-MAY-1990; DK-001 158. 

PA (NOVO ) NOVO NORDISK A/S. 

PI Woldike HF> Hagen F, Hjort CM> Hastrup S. 

DR HPI; 91-353766/43. 

PT Neu fungal (hemi)cellulose degrading enzymes - for prodn. of liq. 

PT fuel gas and feed proteini have specified carbohydrate binding domain 
PS Claim 20; Page 45; 73pp; English. 

CC This CBD is homologous to a terminal A region of Trichoderna reesei 
CC cellulases and effects binding of a protein to an insoluble 

CC cellulosic or hemicellulosic substrate. It is one of ten specific 

CC CBD’ s (see R15244-R15253) which correspond to the generic CBD 

CC formulae in R15242 and R15243. The CBD is incorporated into a fusion 

CC protein comprising a catalytic domain from a cellulase> e.g. a 

CC Bacillus eridoglucanase. end optionally comprising a linking B donain 
CC from e.g. a fungal eridoglucanase. 

SQ Sequence 32 AA; 

SQ l A; l R; 2 N; 0 D; 0 B; 5 C; 7 Q; 0 E; 0 z; 6 G; 0 H; 

SQ 0 I; l L; 0 K; 0 M; 0 F; 1 P; 2 S; 2 T; 3 H; l Y; 0 v; 

Initial Score = 8 Optimized Score = 9 Significance = 4.00 

Residue Identity = 297. Hatches = 10 Mismatches = 22 

Gaps = 2 Conservative Substitutions = 0 

60 70 80 90 100 X 110 120 

SSIGGQPNCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTF 

II II II II 

WGQCGG6--GWQGPTCCSQGTC 
X 10 20 

130 X 140 150 160 170 180 

ND8NGTGVCRPWTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQV 

II 

RAQNQWYSQCLN 
30 X 



> 0 < 

Q| |Q Intel I iGenetics 
> 0 < 

FastDB - Fast Pairuise Comparison of Sequences 
Release 5.4 

Results file ellis-0i2-fig2ab-ngs.res made by shears on Tue 14 Sep 93 15 ? 38 : 42-PDT . 


Query sequence being conpared:ELLIS-012-FIG2AB.SEQ (1-2350) 

Number of sequences searched: 30843 

Number of scores above cutoff: 4307 

Results of the initial comparison of ELLIS-Q12-FIG2AB.SEQ (1-2350) with: 
Data bank : N-GeneSeq lit all entries 

10000 - 

N 

U 5000- m 
M - ft 
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E 
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Q 
U 
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N 
C 
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10 - 
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ft 
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ft» » 

ft « » 

# 

# «e 


1 £ 1 X 


SCORE 0 
STDEV 


16 

0 


65 

3 


81 

4 


146 


32 49 

1 2 


97 114 130 

5 6 7 


PARAMETERS 


Similarity matrix 

Unitary 

K-tuple 

4 

Mismatch penalty 

1 

Joining penalty 

30 

Gap penalty 

1.00 

Window size 

32 

Gap size penalty 

0.33 



Cutoff score 

10 



Randomization group 

0 



Initial scores to save 40 

Alignments to save 

15 

Optimized scores to 

s ave 0 

Display context 

50 


SEARCH STATISTICS 


Scores: 

Mean 

Median 

Standard Deviation 


22 

16 

16.13 

T imes : 

CPU 


Total Elapsed 


00:13:08.00 


00:26:40.00 


Number of residues? 16009476 

Number of sequences searched: 30843 

Number of scores above cutoff: 4307 

Cut-off raised to 15. 

Cut-off raised to 26. 

Cut-off raised to 32. 

Cut-off raised to 38. 

Cut-off raised to 42. 

The scores belou are sorted by initial score. 

Significance is calculated based on initial score. 

A 1007. identical sequence to the query sequence bias not found. 


The list of best scores is: 


Sequence Name Description 


Init. Opt. 

Length Score Score Sig. Frame 


#*« 7 standard deviations above mean 


1. 

<321695 

Plasma membrane proton ATPase 

2933 

146 

764 

7.69 

2. 

(323313 

DNA encoding masking protein 

5136 

141 

942 

7.38 

3. 

011579 

Encodes granulocyte colony st 

2546 

138 

967 

7.19 

4. 

011580 

Clone 25-1 encodes human G-CS 

2931 

138 

967 

7.19 

5. 

013856 

Human GCSF receptor gene in p 

2942 

138 

971 

7.19 

6. 

N61379 

Sequence encoding porcine bet 

728 

137 

321 

7.13 

7. 

N60741 

Sequence of porcine beta-foil 

728 

137 

322 

7.13 

8. 

003847 

Porcine beta FSH subunit. 

780 

137 

339 

7.13 

9. 

028758 

Partial sequence of tumour su 

4328 

135 

953 

7.01 



6 standard deviations above mean *»*« 



10. 

025975 

MH mutant porcine ryanodine r 

15377 

134 

994 

6.94 

11. 

014755 

FUS2 gene. 

2492 

129 

939 

6.63 

12. 

N70128 

Novel DNA encoding a polypept 

1363 

127 

587 

6.51 

13. 

N81162 

Encodes Western subtype of ea 

2418 

126 

790 

6.45 

14. 

035297 

ZYMV genome. 

9593 

124 

977 

6.32 

15. 

029860 

Odorant receptor clone 17. 

983 

123 

393 

6.26 

16. 

N71002 

Sequence encoding a human gra 

911 

122 

386 

6.20 

17 

B7A7Qn 

U f-r-T one •AM A 

70 on 

1 OO 

oca 

L on 


0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 


18. 

310263 

p2 1 30 contg. Calgene lambda 1 

4383 

119 

941 

6.01 

0 

19. 

311415 

Ryanodine receptor gene. 

15464 

119 

989 

6.01 

0 



5 standard deviations above mean 




20. 

N91839 

Pasteurella multocida toxin g 

4380 

116 

494 

5.83 

0 

21. 

321645 

3’ coding sequence of P.falci 

1297 

115 

531 

5.77 

0 

22. 

N71064 

Gene encoding Plasmodium viva 

1908 

115 

707 

5.77 

0 

23. 

N40166 

Sequence of A.auamor-i glucoan 

3408 

115 

976 

5.77 

0 

24. 

310883 

30kD TNF inhibitor precursor 

2088 

114 

728 

5.70 

0 

25. 

310955 

Encodes human 55kD TNF-bindin 

2111 

114 

730 

5.70 

0 

26. 

306285 

Human Tumour Necrosis Factor- 

2141 

114 

730 

5.70 

0 

27. 

012215 

Type 1 TNF receptor. 

2176 

114 

725 

5.70 

0 

28. 

034941 

Calgene Lambda 140 genomic cl 

4383 

113 

941 

5.64 

0 

29. 

835143 

Calgene lambda 140/pZ130 DNA 

4383 

113 

944 

5.64 

0 

30. 

810319 

Calgene lambda 140 genomic cl 

4383 

113 

941 

5.64 

0 

31. 

320532 

Sequence of clone lambdaAPCPl 

2256 

112 

919 

5.58 

0 

32. 

Q10014 

Clone lambda APCP168i4 of bet 

2256 

112 

919 

5.58 

0 

33. 

N80604 

Lambda APCP168i4, amino acids 

2256 

112 

917 

5.58 

0 

34. 

305086 

Sequence encodes NAP-2 gene a 

2949 

112 

939 

5.58 

0 

35. 

N91050 

Sequence encoding novel amylo 

2949 

112 

937 

5.58 

0 

36. 

824442 

Encodes truncated TNF-alpha 5 

474 

110 

204 

5.46 

0 

37. 

324441 

Encodes truncated TNF-alpha 5 

608 

110 

203 

5.46 

0 

38. 

N90907 

Glutamine synthesis gene. 

1200 

110 

503 

5.46 

0 

39. 

806282 

Plasmid Tumour Necrosis Facto 

1334 

110 

427 

5.46 

0 

40. 

803599 

Human liver cytochrome P-450 

1813 

110 

453 

5.46 

0 


1. ELL IS-01 2-FIG2AB .SEQ (1-2350) 

Q21695 Plasna membrane proton ATPase. 

ID G21695 standard; DNA ; 2933 BP. 

AC Q21695; 

DT 02-JUN-1992 (first entry) 

DE Plasna membrane proton ATPase. 

KW Antifungal agents; H+ ATPase; ss. 

OS Candida albicans. 

FH Key Location/Qualifiers 

FT CDS 151.. 2842 

FT /*tag= a 

FT /product= H+ ATPase 
PN EP-472286-A. 

PD 26-FEB-1992. 

PF 18-JUL-1991 306542. 

PR 18-JUL-1990; US-555123. 

PA (MERI ) MERCK ti CO INC. 

PI Kurtz MB, Narrinan JA; 

DR HPU 92-066496/09. 

DR P-PSDB; R21580. 

PT Neu gene for evaluating antifungal agents - encodes Candida 
PT albicans plasna membrane H-adenosine.’tri iphosphatase 
PS Claim 2; Page 3; 25pp; English. 

CC A large, single colony of Candida albicans ATCC 10261 uas cultured 
CC and chromosomal DNA extracted. The DNA uas digested uith restriction 
CC enzymes and fragments probed uith a fragment isolated From plasmid 
CC B1138 contg. the Saccharomyces cerevisiae plasma membrane ATPase 
CC (PMA1) gene in the pUC18 vector. Multiple restriction enzyme digests 
CC shoued the C. albicans DNA to be homologous to the S. cerevisiae 

CC fragment. A library of C. albicans genomic DNA uas constructed. 

CC (rich in the DNA encoding the plasma membrane proton ATPase) using 

CC strain WO-1 and inserted into pEHBLY-23. A positive clone of 12- 

CC 15 kb uas ligated into the YEp24 vector, and transformed in E. coli 

CC K-12 strain DH5 alpha. Recombinant plaques uere isolated and 

CC sequenced, shouing a gene of 2.7 kb. The gene can be used to 
CC transform non-pathoaenic yeast uhich can be used to evaluate agents 

CC capable of perturbing C. albicans plasna membrane H+ ATPase activity. 

CC The gene also provides a means for producing large amounts of the 

rr n! Jtifl: mcmhp.ifc ^** *■, * a » *- 


SQ Sequence 2933 BP; 


758 A; 


518 Cf 


633 G; 


1024 T; 


Initial Score 

= 146 

Optimized Score = 

764 

Significance 

= 7.69 

Residue Identity 

= 477. 

Hatches = 

940 

Mismatches 

= 768 

Gaps 

= 267 

Conservative Substitutions 


= 0 

460 

470 

480 490 

500 

X 510 

520 


GGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCA 


TCT ATCATTTGTTAA — 

X 10 

530 540 550 560 570 580 590 

GAACGGTACTGGCGTCTGTCGACCCTGGACGAACTGCTCTCTAGACGGAAGGTCTGTGCTTAAGACCGGGAC 

IN II I III i III I I I II I I II II I 

TATT TATTTATACCAAGCACCA TATAAATACCTAGTTTTTTTTTTTTTTTTTG 

20 30 40 50 60 


600 610 620 630 640 650 660 

CACGGAGAAGGACGTGGTGTGTGGACC — CCCTGTGGTGAGCTTCTCTCCCAGTACCA-CCATTTCTGTGA 

i ii ii i i I i i i i mi i i in it i i mi i i 

TTTGTTAAATCACTTTTTTTTTCAATCTTTGTTTTTGGTTAATTAATCT-TAAGAATAAGGGATTTTTATAT 
70 80 90 100 110 120 130 

670 680 690 700 710 720 

CTCCAGA GGGAGGACCA-GGAGGGCAC-TCCTTGCAGG-TCCTTACCTTGT-TCC — TGGCGCTGACA 

III III I I II II I III 1 II 1 II III II I III 
ATATATAAACCATGAGTGCTACTGAACCAACCAACGAAAAGGTTGATAAAATCGTCTCCGATGATGAAGACG 
140 150 160 170 180 190 200 210 

730 740 750 760 770 780 790 

TCGGCTTTG-CTGCT — GGCCCTGATCTTCATTACT — CTCCTGTTCTCTGTGCTCAAATGGATCAGGA-AA 

i ini i iii inn i n i n ii n n in i n n nil i 

AAGACATTGACCAATTAGTCGCTGATTTACAAT-CTAACCCAGGTGCT-GGTGATGAAGAAGAAGAGGAGGA 
220 230 240 250 260 270 280 

800 810 820 830 840 850 860 

AAATTCCCCCACATATTCAAGCAACCATTTAAGAAG-ACCACTGG-AGCAG-CTCAAGAGGAAGATGCTTGT 

nil i i i inn i n i inn i i in i i i i nn n n n i 

AAATGACTCTTC — CTTCAA--AGCCGTCCCAGAAGAATTATTGGAAACTGACCCAAG AGTTGGTT-T 

290 300 310 320 330 340 

870 880 890 900 910 920 930 

AGCTGCCGATG— TCCACAGGAAGAAGAAGGAGGAGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAG 

III III! Ill MMI! II II II II 1 II I II I 1 I II I 

GACTGATGATGAAGTCACCAAAAGAAGAAAGA-GATACGGTTTGAATCAAATG-GCTGAAGAA-- -CAAGAAA 
350 360 370 380 390 400 410 

940 950 960 970 980 990 1000 

ATGTGTGGGCCGAAA-CCGAGAAGCACTAGGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACC 

I II I I III II I I II I III I II I I II II I I 

ACTTG-GTTCTTAAATTCGTCATGTTCTTTG TTGGTCCAATTCAATTCGTTATGGAA-GCCGCTGC- 

420 430 440 450 460 470 

1010 1020 1030 1040 1050 1060 1070 

CTGTTCTTACACATCATCCTAGATGA-TGTGTGGGCGCGCACCTCATCCAAG-TCTCTTCTAACGCTAACAT 

nil i i i i nn nnni tin m mm i i i 

-TGTITTGGCTGCTGGT-TTAGAAGATTGGGTCGATTTCGGTGTTATCTGTGCTTTATTGTTATTGAATGCT 
480 490 500 510 520 530 540 


1080 1090 1100 1110 1120 1130 1140 

AT — TTGTCTTT A-CCTTTTTTA — AATCTTTTTTTAAATTTAAATTTTATGTGTGTGAGTGTTTTGCCTGC 

i in nil n n n n n ii ii n n n n in in i 

TTTGTTGGTTTTATCCAAGAATACCAAGCTGGTTCT-ATTGTCGAT-GAAT-TGAAAAAGACTTTGGCCAAC 
550 560 570 580 590 600 610 



1150 1160 1170 1180 1190 1200 

CTGTATGCACACGTGTGTGTG TGTGTGTGTG-TGACACTCCTGATGCCTGAGGAGGTCAGAAGAGAAA 

I III I II III I II I I 111 I II I I I II I II I I II I 
— TCTGCTCTTGT-TGTTAGAAACGGTCAATTAGTTGAAATCCCAGCTAAC-GAAGTTGTTCCAGGTGATA 
620 630 640 650 660 670 680 

1210 1220 1230 1240 1250 1260 

GGGTTGGTTCCATAAGA — ACTGGA— GTTAT GGATGGCTG — TGAGCCGGNNNGATAGGTCGGG 

iii i ii ii ii it mu mu i ii ii m ii i 

-TCTTG — CAATTGGAAGACGGTACCGTTATTCCAACTGATGGTAGAATTGTTTCTG-AAGATTGTTTGTT 
690 700 710 720 730 740 


1270 1280 1290 1300 1310 1320 1330 

AC — GGAGACCTGTCTTCTTATTTTAAC — GTGACTGTATAATAAAAAAAAAATGATATTTCGGGAATTGTA 

ii i ii i m i mi ii mi i i ii i m ii i ii i i i 

ACAAGTTGATCAATCTGC-TATT — ACTGGTGAATCTTTAGCTGTCGACAAAAGAAGT — GGTGACTCTT 
750 760 770 780 790 800 810 

1340 1350 1360 1370 1380 1390 1400 

GAGATTGTCCTGACACCCTT CTAGTTAATGATCTAAGAGGAATTGTTGATACGTAGTATACTGTATAT 

I I I I II II II II II II I II II II II III 1 II III II I 
GTTACTCTTCTTCTACTGTTAAGACTGGTGAAGCCTTTATGA-TTGTTACTGCTAC-TGGT-GACTCTACTT 
820 830 840 850 860 870 880 

1410 1420 1430 1440 1450 1460 

GTGTATGTATA — TG-T-ATATGTATATATAAG — ACTCTTTTACTGTCAAAGTCAACCTAGAGTGTCTG- 

ii in i ii i i ii ii hi i ii mu i m ii in i ii 

TCGTCGGTAGAGCTGCTGCTTTGGTTAACAAAGCTTCCGCTGGTACTGGTCATTTCA--CTGAAGTCT-TGA 
890 900 910 920 930 940 950 

1470 1480 1490 1500 1510 1520 1530 

— GTTACCAGGTCAATTTTATTGGACATTTTACGTCACACACACACACACACACACACACACACGTTTATAC 

iii i i i ii mi i ii mi n i n i i 

ACGGTATTGGTACTACCTTGTTGGTCTTT GTCATTGTTACTTTGTTGGTCGTTTGGGTTGCTTGTTTC 

960 970 980 990 1000 1010 1020 

1540 1550 1560 1570 1580 1590 1600 

TAC-GTACTGTTATCGGTATTCTACGTCATATAATGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGAT — 

1 1 1 I I! 1111 I III I I II II II 1111 II I I 1 1 II III 1 

TACAGAACCGTTA — GAATTGTTC— CA-ATCTTGAGATACACTTTAGCCATCACTATTATTG-GTGTTCC 
1030 1040 1050 1060 1070 1080 

1610 1620 1630 1640 1650 1660 1670 

-ATTATTGTG-GAGGTGACAGACTACCCCTTC — TGGGTACGTAGGGACAGACCTCCTTCGGACTGTCTAAA 

n i n ii n i mi iii in i n n i i ii m i i n 

AGTTGGTTTGCCAGCTGTC— GTTACCACTACCATGGCT-- GTCGGTGCTG-CTTACTTGGCCAAGAAACAA 
1090 1100 1110 1120 1130 1140 1150 

1680 1690 1700 1710 1720 1730 

— ACTCCCCTTAGA-AGTCT— CGTCAAGTTCCCGGACGAAGAGGACAGAGGAGACACAGTCCGAAAAGTT 

II II II Mil 1 I I I II 1 I II I II II I I 

GCTATTGTCCAAAAATTGTCTGCCATTGAATCTTTGGCTGGTGTTGAAATCTTGTGTTCCGATAAAACCGGT 


1160 1170 

1180 

1190 

1200 

1210 

1220 

1740 1750 

1760 

1770 

1780 

1790 

1800 


ATTTTTCC GGCAAA — TCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTGGA— CACTTGAGTGTC 


ACTTTGACCAAGAACAAATTGTCCTTGCAC-GAACCAT-ACACTGTTGAAGGTGTTGAACCAGATGACT-TG 
1230 1240 1250 1260 1270 1280 1290 

1810 1820 1830 1840 1850 1860 

ATCCTTGCGCCGGAAGGTCAGGTGGTACCCGTCTGTAG— GGGCGGGGA-GACA--GA GCCGCGGGG 

II III I I I I II II I I I I II III III Ml II III 

AT-GTTG-ACTGCTTGTTTAGCTGCTTCTAGAAAGAAGAAGGGTTTGGATGCCATTGATAAAGGTTTCTTGA 
1300 1310 1320 1330 1340 1350 1360 



1870 1880 1890 1900 1910 1920 1930 

GAGCTACGAGAATCGACTCACAGGGCGCCCCGGGCTTCGC--AAATGAAACTTTTTTAATCTCACAAGTTTC 

Ml II I I II I! II II llll II till II II II I II I Nil 

AATCTTTGATCAACTACCCA-AGAGCTAAAGCTGCTTTGCCAAAATACAAGGTTATTGAATTCCAACCTTTC 
1370 1380 1390 1400 1410 1420 1430 

1940 3950 1960 1970 1980 1990 2000 

G-TCCGGGCTCGGCGGACCTATGGCGTCGATCCTTATTACCTTATCCTGGCGCCAAGATAAAACAACCAAAA 

I III I ill llll II I II II I I II I III I III II I 

GATCCTGTCTCCAAGAAAGT-TACTG-CTA — TTGTTGAATCA-CCAG AAGGTGAAAGAATTATTT 

1440 1450 1460 1470 1480 1490 

2010 2020 2030 2040 2050 2060 2070 

GCCTTGACTCCGGTAC-TAATTCTCCCTGCCGGCCCCCGTAAGCATAACGCGGCGATCTCCACTTTAAGAAC 

I II I III llll llll I III II II I III III llll 

GTGTTAAGGGTGCCCCATTATTCGTCTTAAAGACTGTTG-AAG-ATGACCACCCAATC-CCA — GAAGA-- 
1500 1510 1520 1530 1540 1550 

2080 2090 2100 2110 2120 2130 2140 

CTGGCCGCG— TTCTGCCTGGTCTCGCTTTCGTAAACGGTTCTTACAAAAGTAATTAGTTC-TTGCTTTCAG 

II II II II II I I II I II I III II I II I II II III I I 

-TGTCCACGAAAACTACCAAAACACCGTTGCCGAA — TTTGCTTCCAGAGGT-TTCAGATCTTTGGGTGTTG 
1560 1570 1580 1590 1600 1610 1620 

2150 2160 2170 2180 2190 2200 2210 

CCTCCAAG-CTTCTGCTAGTCTATGGCAGCATCAAGGCTGGTATT-TGCTACGGCTGACCGCTACGCCGCCG 

ii iii n in m i n i nun III I I II I I I I 

CCAGAAAGAGAGGTGAAGGTCACTGGGA-AATTTTG — GGT ATTATGCCATGT ATG — GATCCAC 

1630 1640 1650 1660 1670 1680 

2220 2230 2240 2250 2260 2270 

CAATA-AGGGTACTGGGCGGC — CCGTC — GAAG — GCCC-TTTGGTTTCAGAAACCCAAGG — CCCCCC 

in i i i nn n n i m nn i n inn m mi 

CAAGAGATGATACT--GCTGCCACAGTCAATGAAGCTAGAAGATTAGGTTTAAGAGTTAAGATGTTAACTGG 
1690 1700 1710 1720 1730 1740 1750 

2280 2290 2300 2310 2320 2330 2340 

TCATACC-AACGTTTCGACTTTGATTCTTGCCGGTACGTGGTGGTGGGTGCCTTAGCTCTT TCTCGAT 

Ml II III 1 II llll HIM Hill 1 III II 1 III 
TGATGCCGTTGGTATTGCTAAAGAAACTTGTCGTCAATTAGGTTTGGGTAC--TAACATTTACGATGCCGAC 
1760 1770 1780 1790 1800 1810 1820 


X 

AG-TTAGAC 

ii mi 

AGATTAGGTTTGTCCGGTGGTGGTGACATGGCTGGTTCTGAAATTGCTGATTTCGTTGA 
1830 1840 1850 1860 1870 1880 


2. ELLIS-012-FIG2AB.SEQ (1-2350) 

Q23313 DNA encoding masking protein high polymer unit pre 


ID 

023313 standard; DNA; 5136 BP. 

AC 

023313; 


DT 

19-AUG-1992 

(first entry) 

DE 

DNA encoding 

masking protein high polymer unit precursor MPU-P. 

KW 

Transforming growth factor beta; TGF-beta; mammalian cancer; ss. 

OS 

Rattus rattus. 

FH 

Key 

Location/Qualifiers 

FT 

CDS 

1 . .5136 

FT 

/*tag= a 


FT 

misc_feature 

2209,. 4722 

FT 

/»tag= b 


FT 

/note* "N2514 

> encodes P83S" 

FT 

misc feature 

61. .5136 


FT /note= “N5076, encodes P1692 u 
PN J04066597-A. 

PD 02-MAR-1992. 

PF 29-JUN-1990; 173679. 

PR 29-JUN-1990; JP-173679. 

PA (NAHA/) NAKAMURA T. 

DR HPI; 92-120902/15. 

DR P-PSDB; R22461. 

PT Masking protein high polymer unit - combines with transforming 
PT grouth factor beta produced by mammalian cancer cells to inhibit 
PT then 

PS Claim 13; Page 9; 25ppi Japanese. 

CC The sequence codes for the precursor (MPU-P) of a masking protein 
CC high polymer unit (MPU) . The high polymer subunit MPU binds to 

CC transforming grouth factor (TGF) beta produced by mammalian cancer 

CC cells. It may be used to inactivate the cancer cells and thus is 
CC useful in the treatment of human cancers. 

CC See also Q23314 and Q23315. 

SQ Sequence 5136 BP; 1267 A; 1348 C; 1423 G; 1098 T; 

Initial Score = 141 Optimized Score = 942 Significance = 7.38 

Residue Identity = 437. Hatches = 1139 Mismatches = 938 

Gaps = 232 Conservative Substitutions = 0 

X 10 20 

ATGT CCATGAACTGCTGA — GT 

in I! mi in n 

CCCGATGTGTGTAGGGACGGCCGCTGCATCAACACTCCTGGGGCCTTCCGATGCGAAT— ACTG-TGACAGT 
2870 2880 2890 2900 2910 2920 2930 

30 40 50 60 70 80 

GGATA AACAGCACGGGATATCTCTGTCTA-AAGGAATATT-ACT-ACACCAGGAAAAGGACACATT 

II II II Hill II II I I II I I II II III I I III 

GGGTACCGGATGTCACGACGGGGCCACTGTGAGGATATCGATGAGTGTCTGACCCCAAGTACCTGTCCCGAG 
2940 2950 2960 2970 2980 2990 3000 

90 100 110 120 130 140 150 

CGACAA-CAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGCCATG — GGA 

mi i ii ii i i mi nil i nil i n n n in 

GAACAATGCGTGAATTCCCCAGGTTC — TTACCAGTGTGTGCCCTGCACAGAAGGGTT— CCGTGGCTGGA 
3010 3020 3030 3040 3050 3060 3070 

160 170 180 190 200 210 

A — ACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAA-GGTGGGAGC C 

i mi n i i mi i i n mi n i n nil n in n i 

ATGGACAA-TGCCTCGATGTGGACG — AGTG-CCTGCAGCCAAAGGTCTGTACCAATGGTTCCTGCACCAAC 


3080 

3090 

3100 

3110 

3120 

3130 

3140 

220 

230 

240 

250 

260 

270 

280 


GTGCAGAACTCC TGTGATAACTGTCAGCCTGG-TACTTTCTGCAGAAAATACAATCCAGTCTG-CAAG 

n i mi mi i in n n in i n i i n n i in nil 

CTGGAAGGCTCCTACATGTG-TTCCTGCCACAAGGGCTAC-AGCCCCACACCAGACCATAGACACTGTCAAG 
3150 3160 3170 3180 3190 3200 3210 


290 300 310 320 330 340 

A GCTGCCCTCCAAGTACCTTCTCC— AGCATAGGTGG'ACAGCCGAACTGTAACATCTGCAGAGTGTGT 

I I II I II I Ii 1111 II ill II 1111 III I I I 

ATATTGATG-AATGTCAGCAAGGGAACCTGTGCATGAACGGGCAGTGCAA — AAACA-CTGACGGCTCCTT 
3220 3230 3240 3250 3260 3270 

350 360 370 380 390 400 410 

GCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTC 

i i n i n n i ini n i n i i inn inn i n i 

CCGGTGTACCTGTGG-GCAGGGCTATCAGCT-GTCAGCGGCTAAAGACCAATGTGAAGATATTGACGAATGC 
3280 3290 3300 3310 3320 3330 3340 



420 430 440 450 460 470 480 

CATTGCTTGGGGCCACAG — TGCACCAGATGTG-AAAAGGACTGCAGGC-CTGGCCAGGAGCTAACGAAGCA 

I I I II I I ill I III I I III III I llll I II II 

GAGCAC-CGTCACCT CTGCTCTCACGGGCAGTGCAGGAACACAGAGGGCTCCTT CCAGTGTTTGTGCAACCA 

3350 3360 3370 3330 3390 3400 3410 

490 500 510 520 530 540 550 

GGGTTGCAAA-- ACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTGGACG 

mu ii i inn mm 11 1 n i 11 1 i n n i n mi 

GGGTTACAGAGCATCTGT-GCTTGGAGAC — CACTG-CGAGGATATCAATGAATGCT-TGGA GGAC- 

3420 3430 3440 3450 3460 3470 3480 

560 570 580 590 600 610 620 

AACTGCTCTCTAGACGGAAGGTCTGTGC-TTAAGA — CCGGGACCACGGAGAAGGACGTG-GTGTGTGGAC 

ii i m ii mi hi i ii i i m ii ii i mu i mu 

-AGTAGTGTCTGCCAGGGAGGTGACTGCATCAATACAGCAGGGTCCTATGA-CTGCACGTGCCCGGATGGAC 
3490 3500 3510 3520 3530 3540 3550 

630 640 650 660 670 680 690 

CCCCTGTG-GTGAGCTTCTCTCCCAGTACCACCATTTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTCC 

II 11 III I I II llll II I II II Mill II Mil 

TCCAGCTGAATGA-CAATAAGGGCTGTCAAGACATTAATGAATGTGCACAGCCAGGACTCTGTGCAC-CTCA 

3560 3570 3580 3590 3600 3610 3620 

700 710 720 730 740 750 760 

T-TGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTG-CTGCTGGCC — CTGATCTTCATTACTCT 

I III III I I llll II II II III II I I III III Mil 

TGGGGAGTGTCTAAAC--ACACAAGGCTC— ATTCCACTGTGTCTG-TGAACAAGGGTTCTCCAT— CTCT 


3630 

3640 

3650 

3660 

3670 

3680 

770 

780 

790 

800 

810 

820 


CC — TGTTC — TCT-GTGCTCAAATGGATCAG-GAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGA 

I II II II III I II ill II I II II II II II II I 
GCAGATGGTCGTACTTGTGAAGATATTGATGAGTGTGTTAACAACACTGTGTGTGACAGTCACGGCTTCTG- 
3690 3700 3710 3720 3730 3740 3750 

830 840 850 860 870 880 890 

AGACCACTGGAGCAGCTCAAGAGGAAGATGCTTGTA GCTGCCGATGTCCACAGGAAGAAGAAGGAGG 

III II I llll II II II III I I I i 1 1 1 II I I llll I 
TGACAACACAGCCGGCTCTTTCCGCTGCCTCTGTTATCAGGGCTTTCAAGCCCCACAGGATGGGCAAGG-GT 
3760 3770 3780 3790 3800 3810 3820 

900 910 920 930 940 950 960 

AGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAG-ATGTGTGGGCCGAAACCGAGAAGCACTAGGACC 

Mill I lllll II II III lllll II I I III I III 

GTGTGGATGTGAACGAATGTGAACTGC— TCAGTGGTGTATGTGGGGAGGCTTTCTGTGAA-AATGTGGAAG 


3830 

3840 

3850 

3860 

3870 

3880 

3890 

970 

980 

990 

1000 

1010 

1020 

1030 


CCACCATCCTGTG-GAACAGCACAAGCAACCCCACCACCCTGTTCTTACACATCATCCTAGATGATGTGTGG 

II lllllll I III III I II I I III I II I I llll I 

GGTCCTTCCTGTGCGTGTGTGCCGATGAGAACCAGGA GTACAGCCCCATGA — CTGG — GCAGTGTCG 

3900 3910 3920 3930 3940 3950 

1040 1050 1060 1070 1080 1090 

— GCGCGCACCT — CATCCAAGT CTCTTCTAACGCTAA-CATATTTGTCTTTACCTTTTTTAAATC 

II II II II II II Mil I I I II I I I II II till 
CTCCCGGGCTACTGAAGATTCAGGTGTGGATCGTC-AGCCCAAAGAAGAAAAGAAGGAGTGTTATTATAATC 
3960 3970 3980 3990 4000 4010 4020 4030 

1100 1110 1120 1130 1140 1150 1160 

TTTTTTTAAATTTAAATTTTATGTGTGTGAGTGTTTTGCCTGCC — TGTATGCACACGTGTGTGTGTGTGTG 

ii ill ii mi i nil i n n n n i in n n 

TCAAT — GATGCCA — GTCTCTGTGATAACGTGCTGGCCCCCAACGTCACCAAACAAGAGTG-CTG-CTG 
4040 4050 4060 4070 4080 4090 



1170 1180 1190 1200 1210 1220 1230 

TGTGTGACACT CCTGATGCCTGAGGAGGT CAGAAGAGAAAGGGTTGGTT — CCA-TA-AG — AACTG — GAG 

ii i it i ill iii i ii i ii i i ii iii i ii mi in 

TACATCGGGCGCC GGCTGGGGA-GACAATTGTGAGATCTTCCCTTGCCCAGTCCAGGGGACTGCTGAG 

4100 4110 4120 4130 4140 4150 4160 

1240 1250 1260 1270 1280 1290 

TTAT-GGATGGCTGTGAGCCGGNNNGATAGGT CGGGACGGAGACCTGTCTTCTTATTTTAACGTGA 

ii i in mi i i i mi i i mu u him i ii 

TTCTCGGA — AATGTGCCCTAGAGGAAAAGGTTTTGTCCCTGCTGGAGA — ATCCTCTTACGAAACCGGTG 
4170 4180 4190 4200 4210 4220 

1300 1310 1320 1330 1340 1350 1360 

CTGTATAATAAAAAAAAAATGA-TATTTC — GGGAATTGTAGAG — ATTGTCCTGACACCCTTCTAGTTAAT 

ii i ii in i in i i i i iii mi i i i i i n nil i 

GTGAGAACTACAAAGATGCTGACGAATGCCTGCTGTTTGGAGAGGAAATCTGCAAAAAC GGTTACT 

4230 4240 4250 4260 4270 4280 4290 

1370 1380 1390 1400 1410 1420 1430 

GATCTAAGAGGAATTGTTGATACGTAGTATACTGTATATG— TGTATGTA-TATGTATATGTATATATAAGA 

i i n i i i mm i inn iii i i n n i in i n ii 

GTTTGAACACTCAGCCTGGGTATGAATGCTACTGCA-AGGAAGGGACATACTACGATCCTGT-CAAATTACA 
4300 4310 4320 4330 4340 4350 4360 

1440 1450 1460 1470 1480 1490 

CTCTTTTACTGTCAAAGTCAACCTAGA— GTGTCTGGT-TA-CCAGGTCAATTTTATT-GGACATTTTACGT 

I till 1 1 II II III 1 II I 11 I 1 I 1 II 1 11 III II 
GTGTTTTGATATGGATGAATGCCAAGACCCTAACAGTTGTATCGATGGCCAGTGTGTTAATACAGAGGGC-T 
4370 4380 4390 4400 4410 4420 4430 


1500 1510 1520 1530 1540 1550 1560 

CACACACACACACACACACACACACACACGTTTATACTACGTACTGTTATCGGTATTCTAC-GTCATAT-AA 

i m i in mu n ii ini i i ii ii n in 

CTTACAACTGCTTTTGCACCCACCCAATGGTCCTGGATGCCT-CTGAGAAGAGATGTGTGCAGCCAACTGAA 
4440 4450 4460 4470 4480 4490 4500 


1570 1580 1590 1600 1610 1620 1630 1640 

TGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATATTAT-TGTGGAGGTGACAGACTACCCCTTCTGGGT 

i n i in in i nil i i n nn nnii 1 in 

TCAAAT-GAACAAATAGAAGAAACCGA-TGTCTATCAAGATCTGTGCTGG-GA— GCATCTGAGTGAGGAGT 
4510 4520 4530 4540 4550 4560 4570 

1650 1660 1670 1680 1690 1700 1710 

ACGTAGGGACAGACCTCCT-TCGGACTGTCTAAAACTCCCCTTAGA-AGTCTCGTCAAGTTCCCGGACGAAG 

IIH I II 1 1111 1 1 I 1 II 1 I 1 III II I I III I III 1 

ACGT— GTGTAGCCGTCCTCTTGTA— GGCAAGCAGACGACATACACAGAGTGCTGCTGTT— TGTACGGGG 
4580 4590 4600 4610 4620 4630 

1720 1730 1740 1750 1760 1770 

AGGACAGAGGAGACACAGTCCGAAAAGT TATTTTTCCGGCAAAT-CCT-TTCCCTGTTTCGTGACACT 

m n n i nil i i n n n n ini ii in n 

AGG-CATGGGGCATGCAGTGTGCTCTCTGCCCCATGAAGGACTCAGATGACTATGCCCAGCT — GTG-CA — 
4640 4650 4660 4670 4680 4690 4700 

1780 1790 1800 1810 1820 1830 1840 

CCACCCCTTGTGGACACTTGAGTGTCATCC— TTGCGCCGGAAGGTCAGGTGGTAC— CCGT — CTGTAGG 

n m in tin i ii n n Mill inn n i n i in 

ACATCCC-TGT-GACAGGACGGCGGCGACCATATGGACGGGATGCGTTGGTGG-ACTTCAGTGAACAGTA-T 
4710 4720 4730 4740 4750 4760 4770 

1850 1860 1870 1880 1890 1900 

GGCGGGGAGACAGAGCCGCGGGGGAGCTACGAGAATCGACT — CACAGGGCGCCCCGG-GCTTC — GCAAAT 

in n inn n i i i n nn i 11 n n in i 11 i 

GGCCCAGAAACAGACCCTTACTTCA— TTC-AGGATCGCTTTCTAAACAGCTTTGAGGAGCTACAGGCTGAG 
4780 4790 4800 4810 4820 4830 4840 



1910 1920 1930 1940 1950 1960 1970 

GAAACTTTTTTAATCTCACAAGTTTCGTCCGGG — CTCGGCGGACCTATGGCGTCGATCCTTATTACCTTAT 

iii it mi iii ii i i i mi ii ii i ii ii mi 

GAA — TGTGGCATC-CTCAA CGGCTGTGAAAATGGCCG — CTGTGTAAGGGTTCAGGA — AGGTTAT 

4850 4860 4870 4880 4890 4900 

1980 1990 2000 2010 2020 2030 2040 

CCTGGC GCCAAGAT— AAAACAAC— CAAAAG-CCTTGA-CTCCGGTACTAATTCTCCCTGCCGGCCC 

II II II III II II I I I I II II II II I I I I II II 

ACTTGCGATTGCTTTGATGGATATCATCTGGATATGGCCAAGATGACCTGTGTTGA-TGTAAATGAATGCAG 
4910 4920 4930 4940 4950 4960 4970 

2050 2060 2070 2080 2090 2100 2110 

CCGTAAGCATAACGCGGCGATCTCCACTTTAAGAAC-CTGGCCGCGTTCTGCCTGGTCTCGCTTTCGTAAAC 

i i mi in . in i n nun i ini ii i i i i i i 

CGAGCTGAATAA-TCGGATGTCT-CTCTGCAAGAACGCCAAGTGCATTAACACAGAAGGCTCCTACAAATGC 
4980 4990 5000 5010 5020 5030 5040 

2120 2130 2140 2150 2160 2170 2180 

G-GTTCTTACAAAAGTAATTAGTTCTTGCTTTCAGCCTCCAAGCTTCTGCTAGTCTATGGCAGCATCAAGGC 

1 II III I 11 II 1 I II II I llli II III II II I I II 1 I 

GTGTGTCTACCAGGCTACGTA — CCAT-CTGACAAGC-CCAA-CTACTG-TACACCACTG-AACACC — GOT 
5050 5060 5070 5080 5090 5100 

2190 2200 2210 X 2220 2230 2240 2250 

TGGTATTTGCTACGGCTGACCGCTACGCCGCCGCAATAAGGGTACTGGGCGGCCCGTCGAAGGCCCTTTGGT 

i i mi n in i n i i i 

TTGAATTT-AGAC-AAAGACAGTGAC-CTGGAG 
5110 5120 5130 X 

2260 

TTCAGAAACCC 


3. ELL IS-012-F IG2AB. SEQ (1-2350) 

911579 Encodes granulocyte colony stimulating factor rece 

ID 811579 standard; cDNA; 2546 BP. 

AC 911579; 

DT 04-JUL-1991 (first entry) 

DE Encodes granulocyte colony stimulating factor receptor. 

KW G-CSF; receptor! clone D-7; ss. 

OS Homo sapiens. 

FH Key Location/Gualif iers 

FT sig_peptide - 165.. 236 

FT /#tag= a 

FT mat_peptide 237.. 2513 

FT /#tag= b 

FT /product= G-CSFR 

PN WQ9105046-A. 

PD IB— APR-1991 . 

PF 24-SEP-1990; 005434. 

PR 26-SEP-1989; US-412816. 

PR 03-0CT-1989; US-416306. 

PR 03-APR— 1990; US-522952. 

PA (IMHU-) II1I1UNEX CORP. 

PI Smith CA. Larsen AD> Curtis BM! 

DR WPI! 91-132853/18. 

DR P-PSDB; R1 1741 . 

PT Granulocyte-colony-stimulating factor (G-CSF) receptor DNA and 
PT protein - useful as diagnostics and for regulating immune and 
PT inflammatory responses 

PS Claim 1; Fig 2i3r4r5; 34pp ; English. 

CC A cDNA library was constructed from cytoplasmic placental poly (A) + 
CC RNA. Purified cDNA fragments were cloned into psfCAV vector for 


CC plated to provide approximately 800 colonies per plate. The colonies 
CC were harvested and each pool used to prepare plasmid DNA for 
CC transfection into COS-7 cells. Transformants expressing biologically 
CC active cell surface G-CSFR tiere identified by screening for ability 
CC to bind 125-Iodine-G-CSF, Bacteria from a positive pool were plated 

CC and plasmids prepared, COS-7 cells uere transfected and a single 

CC positive clone was identified and designated D-7. A glycerol stock 

CC of bacteria transformed with this G-CSFR cDNA clone in expression 

CC vector pCAV/NDT has been deposited as ATCC #68102. 

CC See also Q11580. 


S9 Sequence 

2546 BP! 

548 A; 844 C; 

687 C 

;; 467 t; 


Initial Score 

= 138 

Optimized Score = 

967 

Significance = 

7.19 

Residue Identity 

= m 

Hatches = 

1196 

Mismatches = 

955 

Gaps 

= 342 

Conservative Substitutions 

= 

0 

10 

20 

30 40 

5C 

) 60 

70 


ATGTCCATGAACTGCTGAGTGGATAAACAGCACGGGATATCTCTGTCTAAAGGAATATTACTACACCAGGAA 

I III I III I lllll I II II I III 

TGGACTGCA GCTGGTTTCAGGAACTTCTCTTGACGA-GAA 

X 10 20 30 


80 90 100 110 120 130 140 

AAGGACACATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGA CATT 

II I II II II I I II I II II MINI III III I 

GAG-AGACCAAGGAGGCCAAGCAGGGGCTGGGCCAGAGGTGCCAACA-TG GGGAAACTGAGGCTCGGC 

40 50 60 70 80 90 100 

150 160 170 180 190 200 

TCGCCATGG-GAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTG CTAGT — GGGCTGTGAGAA 

mi i ii m mi i m i m mi ii i m i in ii ii 

TCGGAAAGGTGAAGTAACTTGTCCAA GATCACAAAGCTGGTGAACATCAAGTTGGTGCTATGGCAA 

110 120 130 140 150 160 170 

210 220 230 240 250 260 

GG'-TGGGAGCCGTGCAGAACT — CCTGTGAT-AACTG-TCAGCCTGGTACT-TTCTGCAGAAAATA-CAATC 

II Mill I lllll II Mil! Ill III Mil I II III! Ill 
GGCTGGGAAAC-TGCAG-CCTGACTTGGGCTGCCCTGATCATCCTGCTGCTCCCCGGAAGTCTGGAGGAGTG 
180 190 200 210 220 230 240 

270 280 290 300 310 320 330 

CAGTCTGCAAGAG — CT — GCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATC-T 

ii i n n n mi m n n i ii mm m i mi 

CGGGC-ACATCAGTGTCTCAGCCC-CCATCGTCCACCTGGGGGATCCCATCACAGCC-TCCTGCATCATCAA 
250 260 270 280 290 300 310 

340 350 360 370 380 390 400 

GCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCA 

inn mi ii ii i ii i ii i i n i ii ii ii i n 

GCAGA — ACTGCA-GCCAT— CTGGACCCGGAGCCACAGATTCTGTGGAGACTGGGAGCAGAGCTTCAGCC 
320 330 340 350 360 370 

410 420 430 440 450 460 470 

TTGAAGGATTCCATTGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCT-A 

i i i I iiiii ii mi in i ii mi ii i n i 

CGGGGGCAGGCAGCAGCGTCTGTCTGATGGGACCCAG GAATCTATCATCA-CCCTGCCC— CACCTCA 


380 

390 

400 

410 

420 

430 

440 

480 

490 

500 

510 

520 

530 

540 


AC— GAAGCAGGGTTGCAAAACCTGTAGCTTGGGAAC — ATTTAATGACCAGAACGGTACTGGCGTCTGTCG 


n i mi i mi n i mi n n i i i i mi n i 

ACCACACTCAGGCCTTTCTCTCCTGCTGCCT— GAACTGGGGCAACAGCCTGCA-GATCCTGGACCAGGTTG 
450 460 470 480 490 500 510 

550 560 570 580 590 600 

ArrcTCGACCAA — rTCfTCTfTAC Arcr.AAcrTrT-CTr-rTTAACArrrrcArrArrnArAAn 


A-GCTGCGCGCAGGCTACCCTCCAGCCATACCCCACAACCTCTCCTGCCTCATGAACCTCACAAC-CAGCAG 
520 530 540 550 560 570 580 

610 620 630 640 650 660 670 

-GACGT-GGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCA-GTACCACCATTTCTGTGA CTCCAG 

i i i mi m i mi i i ii i mi i i n in n n 

CCTCATCTGCCAGTGGGAGCCAGGACCTGAG— ACCCACCTACCCACCAGCTTCACTCTGAAGAGTTTCAAG 
590 600 610 620 630 640 650 

680 690 700 710 720 730 

AG-GGAGG — AC — CAG GAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCT 

n i n n in mi inn i i n i ii n i n ii n ii 

AGCCGCGGCAACTGTCAGACCCAAGGGGACTCCATCCTGGAC — TGC-GTGCCCAAGG-ACGGGCAGAGCCA 
660 670 680 690 700 710 

740 750 760 770 780 790 

TTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCT— CTG TGCTCAAATGGAT-CAGGAAAAAATT 

linn n i i n n nm ill n n in i mi i i i i 

CTGCTGCATCCC ACGCAAACAC-CTGCTGTTGTACCAGAATATGGGCATCTGGGTGCAGGCAGAGAAT 

720 730 740 750 760 770 780 

800 810 820 830 840 850 860 

CCCCCACATATTCAAGCAACCATTTAAGAAGACCACTG-GAGCAGCTCAAGAGGAAGATG CTTGTAGC 

II I I II III I II III! I I II III I II II III 

GCGCTGGGGA— CCAG CATGT— CCCCACAACTGTGTCTTGATCCCATGGATGTTGTGAAACTGGAGC 

790 800 810 820 830 840 

870 830 890 900 910 920 930 

TGCCGATGTCCACAGGAAGAAGAA — GGAG-GAGGAGGAGG-CTATGAGCTGTGATGTACTATCCTAGGAGA 

n m i i in i i i n n i n i i i i i i n mi n 

CCCCCATG-CTGC-GGACCATGGACCCCAGCCCTGAAGCGGCCCCTCCCCAGGCAGG— CT-GCCTA-CAGC 
850 860 870 880 890 900 910 

940 950 960 970 980 990 1000 

TGTG-TGGGCCGAAACCGAGAAGCA-CTAGGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACC 

mi mi i n i in i in i n in n n in i mi i i 

TGTGCTGGG AGCCATG-- GCAGCCAGGCCTGCA-CATAAATCAGA-- -AGTGTGAGCTGCGCCACAAGC 

920 930 940 950 960 970 

1010 1020 1030 1040 1050 1060 1070 

CTGTTCTTACACATCATCCTAGATG — ATGTGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACA 

i i i i i i n n n i mm i i n i i n i nil i i i 

CGCAGCGTGGAGA— AGCC-AGCTGGGCACTGGTGGGCCCCCTCC-CCTTGGAGGCCCTTC— AGTATGAGC 
980 990 1000 1010 1020 1030 1040 

1080 1090 1100 1110 1120 1130 1140 

TATTTGTCTTTACCTTTTTTAAATCTTTTTTTAAATTTAAATTTTATGTGTGTG-AGTGTTTTGCCTG — C 

i i i i in i i i i n n i i i mm 

TCTGCGGGCT — CCTCCCAGCCACG'GCCTACACCCTGCAGATACGCTGCATCCGCTGGCCCCTGCCTGGCCA 
1050 1060 1070 1080 1090 1100 1110 

1150 1160 1170 1180 1190 1200 

CTGTATGCACACGTGTGTGTGTGTGTGTGTGTGACACTCCTGA — TGCCTGAGGAGGTCAGA — AGAGAAAG 

in i n i i i n n n in in ii mm n i i 

CTGGAGCGACTGGAGCCCCAGCCTGGAGCTGAGA-ACTACCGAACGGGCCCCCACTGTCAGACTGGACACAT 
1120 1130 1140 1150 1160 1170 1180 

1210 1220 1230 1240 1250 1260 1270 

GGTTGGTTCCATAAG-AACTGGAGTTATGGATGGCTGTG-AGCCGGNNNGATAG — GT CGGGACGGA 

II III II I I I Iflll III I III III 1 I II II 1 III I 1 

GG-TGGCGGCAGAGGCAGCTGGACCCCAGGA — CAGTGCAGCTGTTCTGGAAGCCAGTGCCCCTGGAGGAA 
1190 1200 1210 1220 1230 1240 1250 

1280 1290 1300 1310 1320 1330 

PTC TC TT^TTATTTTAAP'rTr-4rTrTATA4TA*AAAA6AAATrfiTATTTrrr-rAATTrTArAr'AT 


GACAGCGGACGGATCCAAGGTT-ATGTG — GTTTCTTGGA GACCCT-CAGGCCAGGCTGGGGC 

1260 1270 1280 1290 1300 1310 

1340 1350 1360 1370 1330 1390 1400 1410 

TGTCCTGACACCCTTCTA-GTTAATGATCTAAGAGGAATTGTTGATACGTAGTATACTGTATATGTGTATGT 

lllll II II I I I II II II I I I II I I I I I I I I II 

CATCCTGCCCCTCTGCAACACCACAGAGCTCAGCTGCA — CCTTCCACCT-GCCTTCAGAAGCCCAGGAGGT 
1320 1330 1340 1350 1360 1370 1380 

1420 1430 1440 1450 1460 1470 

ATATGTATATGTATATATAAGACTCTTTTACTGTCAAAGTCAACCTAGAGTGTCT GGTTACCAG--GT 

i i mi ii ii n ii ii i i mi i i m i 

GGCCCTTGTGGCCTATAACTCAGCCGGGACCTCTC-GCCCCACCCCGGTG-GTCTTCTCAGAAAGCAGAGGC 
1390 1400 1410 1420 1430 1440 1450 


1480 1490 1500 1510 1520 1530 1540 

CAATTTTATTGGACATTTTACGTCACACACACACACACACACACACACACACGTTTATACT-ACGTACTGT 

III Mil II 1 II 1 I III Mil I 1 I II II II 

CCAGCT — CTGACCAGACTCCATGCCATGGCCCGAGACCCTCACAGCCTC-TGGGTAGGCTGGGAGCCCCCC 
1460 1470 1480 1490 1500 1510 1520 

1550 1560 1570 1580 1590 1600 1610 

TATCGGTATTCT-ACGTC-ATATAATGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATATTA TT 

III I II I I I II I II II III II II II I III III I 
AATCCATGGCCTCAGGGCTATGTGAT-TGAGTGGGGCCTGGGCCCCCCCAGCGCGAGCAATAGCAACAAGAC 
1530 1540 1550 1560 1570 1580 1590 

1620 1630 1640 1650 1660 1670 

GTGGAGG-TG— ACAGACT — ACCCCTTCTGGGTACGTAGGGACAGACCTCCTTCGG-ACTGTCTAAAACT 

linn n inn i 1 i i nn i n n i 1 n n n t n 

CTGGAGGATGGAACAGAATGGGAGAGCCACGGGGTTTCTGCTGAAGGAGAACATCAGGCCCTTTC — AGCT 
1600 1610 1620 1630 1640 1650 1660 

1680 1690 1700 1710 1720 1730 1740 

CCCCTTAGAAGTC-TCGTCAAGTTCCC--GGACGAAGAGGACA-GAGGAGACACAGTCCGA-AAAGT-TAT- 

i i in n mi i nn i n mm i i in i in i i n mi 

-CTATGAGA — TCATCGTGA— CTCCCTTGTAC — CAGGACACCATGGGACCC — TCCCAGCATGTCTATG 
1670 1680 1690 1700 1710 1720 

1750 1760 1770 1780 1790 1800 

— TTTTC-CGGCAAA TCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTGGACACTTGAGTGTCATCC 

I II I III III I II II 1 II II II II I III III III 

CCTACTCTCAAGAAATGGCTCCCTCCCATG-CCCCAGA-GCTGCA— TCTAAAGCACA-TTG — GCAAGAC 
1730 1740 1750 1760 1770 1780 

1810 1820 1830 1840 1850 1860 1870 

TTGCGC-CGGAAGGTCAGGTGGTACCCG-TCTGTAGGGGCGGGGAGACAGAGCCGCGGGGGAGCTACGAGA- 

II II I I II II III till III III II llllll I I I II I I 

CTGGGCACAGCTGG— AGTGGGTGCCTGAGCCCCCTGAGCTGGG-GA-AGAGCCCC— CTTACCCACTACAC 
1790 1800 1810 1820 1830 1840 1850 

1880 1890 1900 1910 1920 1930 

-ATCGACT-CAC-AGGGCGC CCCGGGCTTCGCAAATGAAACTTTTTTAAT-CTCAC-AAGTTTCGTCC 

III II II I II I III 1111 I I II I I III I I II 1111 

CATCTTCTGGACCAACGCTCAGAACCAGTCCTTCTC-CGCCATCCTGAATGCCTCCTCCCGTGGCTTTGTCC 

1860 1870 1880 1890 1900 1910 1920 

1940 1950 1960 1970 1980 1990 2000 

GGGCTCGGC-GGACCTATGGCGTC-GATCCTTAT-TACCTTAT-CCTGGC-GCCAAGATAAAACAACCAAAA 

i i i m i in i i n nil n in i nn i i i inn i 

TCCATGGCCTGGAGCCCGCCAGTCTGTATCACATCCACCTCATGGCTGCCAGCCAGGCTGGGGCCACCAACA 
1930 1940 1950 1960 1970 1980 1990 

2010 2020 2030 2040 2050 2060 2070 

nrrTTnarTrrrinArTAATTCTrrrTnrrr'rrrrrrr-TAArrATA-Arrri'rrp-ATrTrrArTTTAAPAA 



G — TACAGTCC 
2000 


II Hill I II I II I II I II I II III II 

TCACCCTGATG — ACCTTGACCCCAGAGGGGTCGGAGCTACAC — ATCAT 
2010 2020 2030 2040 2050 

2080 2090 2100 2110 2120 2130 2140 

CCTGGCCGCGTTCTGCCTGGTCTCGCT-TTCGTAAACGGTTCTTACAAAAGTAATTAGTTCTTGCTTTCAGC 

Hill I till III II III II II I I I HI I I 111 II 1111 
CCTGGGCCTGTTCGGCC — TCCTGCTGTTGCTCACCTGCCTCTGTGGAACTGCCTGGCTCT — GTTGCAGC 
2060 2070 2080 2090 2100 2110 

2150 2160 2170 2180 2190 2200 2210 

CTCCAAGCTTCTG-CTAGTCTATGGC— AGCATCAAGGCTGGTATTTGCT-ACGGCTGAC— CGCTAC--GC 

I I II I I I II 1111 II II II ill II II I I III I I 

C-CCAACAGGAAGAATCCCCTCTGGCCAAGTGTC CCAGACCCAGCTCACAGCAGCCTGGGCTCCTGGG 

2120 2130 2140 2150 2160 2170 2180 

2220 2230 2240 2250 2260 2270 2280 

CGC-CGCAATAAGGGTACTGGGCGGCCCGTCGAAGGCCCTTTGGTTTCA-GAAACCCAAGGCCCCCCTCATA 

II I llll I II I II III I nil I II I lllll II 1111 
TGCCCACAATCATGG-AGGAGGATGCCTTCCAGCTGCCCGGCCTTGGCACGCCACCCATCACCAAGCTCA— 
2190 2200 2210 2220 2230 2240 2250 


2290 2300 2310 2320 2330 2340 

CCAACGTT— TCGACTTTG-ATTCTTGCCGGTACGTGGTGGTGGGTGCCTTAGCTCTTTCTCGATAGTTAGA 

ii i i ii ii i linn i hi i ii ini i i ii it 

-CAGTGCTGGAGGAGGATGAAAAGAAGCCGGT — GCCCTGG-GAGTCCCATAACAGCTCAGAGACCTGTGGC 
2260 2270 2280 2290 2300 2310 2320 


2350 

C 

I 

CTCCCCACTCTGGTCCAGACCTATGTGCTCCAGGGGGACCCAAGAGCAGTT 
X 2330 2340 2350 2360 2370 


4. ELLIS-012-FIG2AB.SEQ (1-2350) 

Q 1 1 580 Clone 25-1 encodes human G-CSF receptor. 

ID Q11580 standard; DNA; 2931 BP. 

AC Q11580; 

DT 04-JUL-1991 (first entry) 

DE Clone 25-1 encodes human G-CSF receptor. 

KW granulocyte colony stimulating factor? receptor; clone 25-1; ss. 

OS Homo sapiens. 

FH Key Location/Qualifiers 

FT sig_peptide 165.. 236 

FT /Hag= a 

FT mat_peptide 237.. 2564 

FT /#tag= b 

PN W09105046-A. 

PD 18-APR-1991. 

PF 24-SEP-1990; 005434. 

PR 26-SEP-1989; US-412816. 

PR 03-0CT-1989; US-416306. 

PR 03-APR-1990; US-522952. 

PA (IMMU-) IMHUNEX CORP. 

PI Smith CAt Larsen AD. Curtis Btl; 

DR WPI; 91-132853/18. 

DR P-PSDB; R11742. 

PT Granulocyte-colony-stimulating factor- (G-CSF) receptor DNA and 
PT protein - useful as diagnostics and for regulating immune and 
PT inflammatory responses 
PS Claim 1; Fig 2. 3. 4. 5. 6; 34pp; English. 

CC A cDNA library was constructed from cytoplasmic placental poly (A) + 
CC RNA. Purified cDNA fragments were cloned into psfCAV vector for 

CC i inr. i r.1 n C r.ll i TUJ^ r f ■* “t i m. T nv nr- r X 1 *' tr. m -» c 


CC plated to provide approximately 800 colonies per plate. The colonies 
CC were harvested and each pool used to prepare plasmid DNA for 

CC transfection into COS-7 cells. Transformants expressing biologically 

CC active cell surface G-CSFR were identified by screening for ability 
CC to bind 125-Iodine-G-CSF. Bacteria from a positive pool were plated 

CC and plasmids prepared. COS-7 cells were transfected and a single 

CC positive clone was identified and designated D-7. Clone D-7 bias 

CC used as a probe to screen the placental cDNA library; clone 25-1 

CC was isolated. It is identical to D-7 except that it contains an 
CC intron insertion after nucleotide 24 1 1 » resulting in a change of 
CC reading frame (and of amino acid sequence). 

CC See also 811579. 

SQ Sequence 2931 BP; 807 A; 991 C; 792 G; 541 T; 


Initial 

Residue 

Gaps 

Score = 

Identity = 

138 

477. 

342 

Optimized Score = 967 
Hatches = 1196 
Conservative Substitutions 

Significance = 
Mismatches = 

7.19 

955 

0 


10 

20 

30 40 50 60 

70 


ATGTCCATGAACTGCTGAGTGGATAAACAGCACGGGATATCTCTGTCTAAAGGAATATTACTACACCAGGAA 


TGGACTGCA GCTGGTTTCAGGAACTTCTCTTGACGA-GAA 

X 10 20 30 

80 90 100 110 120 130 140 

AAGGACACATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGA CATT 

II III II II I I II I II II I III II III III I 

GAG-AGACCAAGGAGGCCAAGCAGGGGCTGGGCCAGAGGTGCCAACA-TG GGGA A ACTG AGGCT CGGC 

40 50 60 70 80 90 100 

150 160 170 180 190 200 

TCGCCATGG-GAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTG CTAGT— GGGCTGTGAGAA 

ill i it ill mi i in i in mi n i in i in n n 

TCGGAAAGGTGAAGTAACTTGTCCAA GATCACAAAGCTGGTGAACATCAAGTTGGTGCTATGGCAA 


110 

120 

130 

140 

150 

160 

210 

220 

230 

240 

250 

260 


GG-TGGGAGCCGTGCAGAACT-CCTGTGAT-AACTG-TCAGCCTGGTACT-TTCTGCAGAAAATA-CAATC 

n inn i inn n mi i i m in nil in i i n i i i 

GGCTGGGAAAC-TGCAG-CCTGACTTGGGCTGCCCTGATCATCCTGCTGCTCCCCGGAAGTCTGGAGGAGTG 
180 190 200 210 220 230 240 

270 280 290 300 310 320 330 

CAGTCTGCAAGAG — CT— GCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATC-T 

i i i n n n mi in n n i n linn in i nil 

CGGGC-ACATCAGTGTCTCAGCCC-CCATCGTCCACCTGGGGGATCCCATCACAGCC-TCCTGCATCATCAA 
250 260 270 280 290 300 310 

340 350 360 370 380 390 400 

GCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCA 

inn nn n n i n i n i i n i n i i n i n 

GCAGA — ACTGGA-GCCAT— CTGGACCCGGAGCCACAGATTCTGTGGAGACTGGGAGCAGAGCTTCAGCC 
320 330 340 350 360 370 

410 420 430 440 450 460 470 

TTGAAGGATTCCATTGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCT-A 

i i i i n i i i it nn in i n nn n i n i 

CGGGGGCAGGCAGCAGCGTCTGTCTGATGGGACCCAG GAATCTATCATCA-CCCTGCCC--CACCTCA 


380 

390 

400 

410 

420 

430 

440 

480 

490 

500 

510 

520 

530 

540 


AC— GAAGCAGGGTTGCAAAACCTGTAGCTTGGGAAC — ATTTAATGACCAGAACGGTACTGGCGTCTGTCG 

n i nil i nil n i nn n imim nn n i 

ACCACACTCAGGCCTTTCTCTCCTGCTGCCT— GAACTGGGGCAACAGCCTGCA-GATCCTGGACCAGGTTG 
450 460 47(L 480 490 500 510 


550 


560 570 580 590 600 

ACCCTGGACGAA— CTGCTCTCTAG ACGGAAGGTCT-GTG-CTTAAGACCGGGACCACGGAGAAG 

I III II I II I III II I II III II II I II I II II II II 

A-GCTGCGCGCAGGCTACCCTCCAGCCATACCCCACAACCTCTCCTGCCTCATGAACCTCACAAC-CAGCAG 

520 530 540 550 560 570 580 

610 620 630 640 650 660 670 

-GACGT-GGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCA-GTACCACCATTTCTGTGA CTCCAG 

i i i mi ii i mi i i ii i mi i i ii in ii ii 

CCTCATCTGCCAGTGGGAGCCAGGACCTGAG— ACCCACCTACCCACCAGCTTCACTCTGAAGAGTTTCAAG 
590 600 610 620 630 640 650 

680 690 700 710 720 730 

AG-GGAGG— AC — CAG GAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCT 

II I II II III till Hill I I II I II II I II I III II 

AGCCGCGGCAACTGTCAGACCCAAGGGGACTCCATCCTGGAC — TGC-GTGCCCAAGG-ACGGGCAGAGCCA 
660 670 680 690 700 710 

740 750 760 770 780 790 

TTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCT— CTG TGCTCAAATGGAT-CAGGAAAAAATT 

mm ii i i n n inn 1 i i it 1 1 m i mi i i i i 

CTGCTGCATCCC ACGCAAACAC-CTGCTGTTGTACCAGAATATGGGCATCTGGGTGCAGGCAGAGAAT 

720 730 740 750 760 770 780 

800 810 820 830 840 850 860 

CCCCCACATATTCAAGCAACCATTTAAGAAGACCACTG-GAGCAGCTCAAGAGGAAGATG CTTGTAGC 

II I Ml III I II 1111 I I II III I II II III 

GCGCTGGGGA— CCAG CATGT— CCCCACAACTGTGTC-TTGATCCCATGGATGTTGTGAAACTGGAGC 

790 800 810 820 830 840 

870 880 890 900 910 920 930 

TGCCGATGTCCACAGGAAGAAGAA— GGAG-GAGGAGGAGG-CTATGAGCTGTGATGTACTATCCTAGGAGA 

II III I I III I I I II II I II I I I I I I II 1111 II 

CCCCCATG-CTGC-GGACCATGGACCCCAGCCCTGAAGCGGCCCCTCCCCAGGCAGG — CT-GCCTA-CAGC 
850 860 870 880 890 900 910 

940 950 960 970 980 990 .1000 

TGTG-TGGGCCGAAACCGAGAAGCA-CTAGGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACC 

mi mi i n i m i in i n in ii ii in i mi i i 

TGTGCTGGG AGCCATG — GCAGCCAGGCCTGCA-CATAAATCAGA — AGTGTGAGCTGCGCCACAAGC 

920 930 940 950 960 970 

1010 1020 1030 1040 1050 1060 1070 

CTGTTCTTACACATCATCCTAGATG — ATGTGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACA 

i i i i i i n n it i mm i i ii i i ii i mi i i i 

CGCAGCGTGGAGA— AGCC-AGCTGGGCACTGGTGGGCCCCCTCC-CCTTGGAGGCCCTTC— AGTATGAGC 
980 990 1000 1010 1020 1030 1040 

1080 1090 1100 1110 1120 1130 1140 

TATTTGTCTTTACCTTTTTTAAATCTTTTTTTAAATTTAAATTTTATGTGTGTG-AGTGTTTTGCCTG — C 

i i i i in i i i i ii ii i i i mm 

TCTGCGGGCT— CCTCCCAGCCACGGCCTACACCCTGCAGATACGCTGCATCCGCTGGCCCCTGCCTGGCCA 
1050 1060 1070 1080 1090 1100 1110 

1150 1160 1170 1180 1190 1200 

CTGTATGCACACGTGTGTGTGTGTGTGTGTGTGACACTCCTGA — TGCCTGAGGAGGTCAGA — AGAGAAAG 

in i n i i i n n n in i n ii mm n i i 

CTGGAGCGACTGGAGCCCCAGCCTGGAGCTGAGA-ACTACCGAACGGGCCCCCACTGTCAGACTGGACACAT 
1120 1130 1140 1150 1160 1170 1180 

1210 1220 1230 1240 1250 1260 1270 

GGTTGGTTCCATAAG-AACTGGAGTTATGGATGGCTGTG-AGCCGGNNNGATAG — GT CGGGACGGA 

I! Ill II 1 I I Hill III I III III I 1 II II I III I I 

GG-TGGCGGCAGAGGCAGCTGGACCCCAGGA — CAGTGCAGCTGTTCTGGAAGCCAGTGCCCCTGGAGGAA 
1190 1200 1210 1220 1230 1240 1250 



1230 1290 1300 1310 1320 1330 

GAC — CTGTC — TTCTTATTTTAACGTGACTGTATAATAAAAAAAAAATGATATTTCGGGAATTGTAGAGAT 

1IIMI III !! I III I! I i I II I I li III 

GACAGCGGACGGATCCAAGGTT-ATGTG — GTTTCTTGGA GACCCT-CAGGCCAGGCTGGGGC 

1260 1270 1280 1290 1300 1310 

1340 1350 1360 1370 1380 1390 1400 1410 

TGTCCTGACACCCTTCTA-GTTAATGATCTAAGAGGAATTGTTGATACGTAGTATACTGTATATGTGTATGT 

Hill I I II I I I II II II I I I II I I I I || | | II 

CATCCTGCCCCTCTGCAACACCACAGAGCTCAGCTGCA — CCTTCCACCT-GCCTTCAGAAGCCCAGGAGGT 
1320 1330 1340 1350 1360 1370 1380 

1420 1430 1440 1450 1460 1470 

ATATGTATATGTATATATAAGACTCTTTTACTGTCAAAGTCAACCTAGAGTGTCT GGTTACCAG— GT 

i i mi ii mi ii ii ii mi i i m i 

GGCCCTTGTGGCCTATAACTCAGCCGGGACCTCTC-GCCCCACCCCGGTG-GTCTTCTCAGAAAGCAGAGGC 
1390 1400 1410 1420 1430 1440 1450 


1480 1490 1500 1510 1520 1530 1540 

CAATTTTATTGGACATTTTACGTCACACACACACACACACACACACACACACGTTTATACT — ACGTACTGT 

III II II I 1 I II I I II 1 Mil I I I II II II 

CCAGCT— CTGACCAGACTCCATGCCATGGCCCGAGACCCTCACAGCCTC-TGGGTAGGCTGGGAGCCCCCC 
1460 1470 1430 1490 1500 1510 1520 

1550 1560 1570 1580 1590 1600 1610 

TATCGGTATTCT-ACGTC-A7ATAATGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATATTA TT 

III I II I I I II I II II III II II II I III III | 
AATCCATGGCCTCAGGGCTATGTGAT-TGAGTGGGGCCTGGGCCCCCCCAGCGCGAGCAATAGCAACAAGAC 
1530 1540 1550 1560 1570 1580 1590 

1620 1630 1640 1650 1660 1670 

GTGGAGG-TG — ACAGACT — ACCCCTTCTGGGTACGTAGGGACAGACCTCCTTCGG-ACTGTCTAAAACT 

mm ii mu i i i i im i n n i i n n n i 11 

CTGGAGGATGGAACAGAATGGGAGAGCCACGGGGTTTCTGCTGAAGGAGAACATCAGGCCCTTTC — AGCT 
1600 1610 1620 1630 1640 1650 1660 

1680 1690 1700 1710 1720 1730 1740 

CCCCTT AGAAGTC-T CGT CAAGTT CCC — GGACGAAGAGGACA-GAGGAGACACAGTCCGA-AAAGT-TAT- 

i i in ii mi i nn i n mm i i in i in i i n m 

-CTATGAGA — TCATCGTGA — CTCCCTTGTAC — CAGGACACCATGGGACCC--TCCCAGCATGTCTATG 
1670 1680 1690 1700 1710 1720 

1750 1760 1770 1780 1790 1800 

— TTTTC-CGGCAAA TCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTGGACACTTGAGTGTCATCC 

I II 1 III III I II II' I II II II II I III III III 

CCTACTCTCAAGAAATGGCTCCCTCCCATG-CCCCAGA-GCTGCA— TCTAAAGCACA-TTG — GCAAGAC 
1730 1740 1750 1760 1770 1780 

1810 1820 1330 1840 1850 1860 1870 

TTGCGC-CGGAAGGTCAGGTGGTACCCG-TCTGTAGGGGCGGGGAGACAGAGCCGCGGGGGAGCTACGAGA- 

II II I I II II III II I I I II III II llllll I I I II I I 

CTGGGCACAGCTGG — AGTGGGTGCCTGAGCCCCCTGAGCTGGG-GA-AGAGCCCC — CTTACCCACTACAC 
1790 1800 1810 1820 1830 1840 1850 

1880 1890 1900 1910 1920 1930 

-ATCGACT-CAC-AGGGCGC CCCGGGCTTCGCAAATGAAACTTTTTTAAT-CTCAC-AAGTTTCGTCC 

III II II I II I III till I I II I Mil I I II llil 

CATCTTCTGGACCAACGCTCAGAACCAGTCCTTCTC-CGCCATCCTGAATGCCTCCTCCCGTGGCTTTGTCC 

1860 1870 1880 1890 1900 1910 1920 

1940 1950 1960 1970 1980 1990 2000 

GGGCTCGGC-GGACCTATGGCGTC-GATCCTTAT-TACCTTAT-CCTGGC-GCCAAGATAAAACAACCAAAA 

i i i m i in i i n nn n m i nil i i i inn i 

TCCATGGCCTGGAGCCCGCCAGTCTGTATCACATCCACCTCATGGCTGCCAGCCAGGCTGGGGCCACCAACA 
1930 1940 1950 1960 1970 1980 1990 



2010 2020 2030 2040 2050 2060 2070 

GCCTTGACTCCQGTACTAATT CTGCCT GCCGGCCCCCGT AAGCAT A-ACGCGGCG-ATCTCCACTTTAAGAA 

i i i in li mu i in i i i i i i ii i ii m ii 

G — TACAGTCC TCACCCTGATG — ACCTTGACCCCAGAGGGGTCGGAGCTACAC — ATCAT 

2000 2010 2020 2030 2040 2050 

2080 2090 2100 2110 2120 2130 2140 

CCTGGCCGCGTTCTGCCTGGTCTCGCT-TTCGTAAACGGTTCTTACAAAAGTAATTAGTTCTTGCTTTCAGC 

mu i mi m ii m ii i ii i i ii i ii hi ii ini 

CCTGGGCCTGTTCGGCC — TCCTGCTGTTGCTCACCTGCCTCTGTGGAACTGCCTGGCTCT — GTTGCAGC 
2060 2070 2080 2090 2100 2110 

2150 2160 2170 2180 2190 2200 2210 

CTCCAAGCTTCTG'-CTAGTCTATGGC — AGCATCAAGGCTGGTATTTGCT-ACGGCTGAC — CGCTAC — GO 

i nil i i ii mi ii n ii m it ii i i m i i 

C-CCAACAGGAAGAATCCCCT CTGGCCAAGTGT C CCAGACCCAGCTCACAGCAGCCTGGGCTCCTGGG 

2120 2130 2140 2150 2160 2170 2180 

2220 2230 2240 2250 2260 2270 2280 

CGC-CGCAATAAGGGTACTGGGCGGCCCGTCGAAGGCCCTTTGGTTTCA-GAAACCCAAGGCCCCCCTCATA 

II I till Mil II HI 1 III) I II 1 lllll II 1111 
TGCCCACAATCATGG-AGGAGGATGCCTTCCAGCTGCCCGGCCTTGGCACGCCACCCATCACCAAGCTCA— 
2190 2200 2210 2220 2230 2240 2250 


2290 2300 2310 2320 2330 2340 

CCAACGTT — TCGACTTTG-ATTCTTGCCGGTACGTGGTGGTGGGTGCCTTAGCTCTTTCTCGATAGTTAGA 

ii i i ii ii i mm i m i ii ii ii i i ii ii 

-CAGTGCTGGAGGAGGATGAAAAGAAGCCGGT— GCCCTGG-GAGTCCCAT AACAGCTCAGAGACCTGTGGC 
2260 2270 2280 2290 2300 2310 2320 


2350 

C 

I 

CTCCCCACTCTGGTCCAGACCTATGTGCTCCAGGGGGACCCAAGAGCAGTT 
X 2330 2340 2350 2360 2370 


5. ELLIS-012-FIG2AB.SEQ (1-2350) 

013856 Human GCSF receptor gene in pHQ3/pHG12. 

ID 913856 standard; DNA; 2942 BP. 

AC 913856? 

DT 08-JAN-1992 (first entry) 

DE Hunan GCSF receptor gene in pHQ3/pHG12. 

KW Granulocyte colony stimulating factor; ss. 

OS Homo Sapiens. 

FH Key Location/Qualifiers 

FT sig_peptide 169.. 237 

FT /Hag= a 

FT mat_peptide 238.. 2676 

FT /Hag= b 

PN W09114776-A. 

PD 03-QCT- 1 99 1 . 

PF 22— WAR— 1991 ; JQ0375. 

PR 23-HAR-1990; JP-074539. 

PR 03-JUL- 1 990 ; JP-176629. 

PA (0SAB-) OSAKA BIOSCIENCE IN. 

PI Nagata S. Fukunaga R; 

DR UP I; 91-310576/42. 

DR P-PSDB; R14255. 

PT DNA encoding granulocyte colony stimulating factor receptor - for 
PT recombinant prodn. of GCSF receptor useful in therapy and 

PT research. 

PS Claim 1; Fig 8; 99pp; Japanese. 

CC The sequence uas obtd. from a cDNA library prepd. from human 

rr li j c ♦ <rirn4 i r 1 i 3 , 11Q77 r-e.1t e HMA N,nm t K* 


CC murine gene (see Qi 3855) . The genes can be used to produce 

CC recombinant receptors for use in research and for diagnostic assays. 

CC See also Q13S57 and Q13853. 


SQ Sequence 

2942 BP; 

611 a; 993 c; 

796 G; 542 T; 


Initial Score 

= 138 

Optimized Score = 

971 Significance = 

7.19 

Residue Identity 

= 48% 

Hatches = 

1203 Mismatches = 

951 

Gaps 

II 

C*J 

cn 

Conservative Substitutions = 

0 

10 

20 

X 30 40 

50 60 

70 


ATGTCCATGAACTGCTGAGTGGATAAACAGCACGGGATATCTCTGTCTAAAGGAATATTACTACACCAGGAA 


GAAGCTGGACTGCA GCTGGTTTCAGGAACTTCTCTTGACGA-GAA 

X 10 20 30 40 

80 90 100 110 120 130 

AAGGACACATTCGACAACAGGAAAGGAGCCTGTCACAGA-AAACCACAGTGTCCTGTGCATGTGA CAT 

it mi ii ii i i iii iii mi iiiii ii i i i in i 

GAG-AGACCAAGGAGGCCAAGCAGGG-G-CTGGGCCAGAGGTGCCACA-TG GGGAAACTGAGGCTCGG 

50 60 70 80 90 100 

140 150 160 170 180 190 200 

TTCGCCATGG-GAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTG CTAGT— GGGCTGTGAGA 

iii i ii iii mi i iii i iii mi ii i iii i m n i 

CTCGGAAAGGTGAAGTAACTTGTCCAA GATCACAAAGCTGGTGAACATCAAGTTGGTGCTATGGCA 

110 120 130 140 150 160 170 

210 220 230 240 250 260 

AGG-TGGGAGCCGTGCAGAACT— CCTGTGAT-AACTG-TCAGCCTGGTACT-TTCTGCAGAAAATA-CAAT 

III Hill I Illli II I II I I III III INI III I I II III 
AGGCTGGGAAAC-TGCAG-CCTGACTTGGGCTGCCCTGATCATCCTGCTGCTCCCCGGAAGTCTGGAGGAGT 
180 190 200 210 220 230 240 

270 280 290 300 310 320 330 

CCAGTCTGCAAGAG — CT—GCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATC- 

i i i ii ii ii mi in n n i n mm in i nil 

GCGGGC-ACATCAGTGTCTCAGCCC-CCATCGTCCACCTGGGGGATCCCATCACAGCC-TCCTGCATCATCA 
250 260 270 280 290 300 310 

340 350 360 370 380 390 400 

TGCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGC 

iiiii nil n n i n i n i i n i n i i n i n 

AGCAGA — ACTGCA-GCCAT— CTGGACCCGGAGCCACAGATTCTGTGGAGACTGGGAGCAGAGCTTCAGC 
320 330 340 350 360 370 

410 420 430 440 450 460 470 

ATTGAAGGATTCCATTGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCT- 

i i i i n i i i ii mi in i n mi n i n 

CCGGGGGCAGGCAGCAGCGTCTGTCTGATGGGACCCAG GAATCTATCATCA-CCCTGCCC— CACCTC 

380 390 400 410 420 430 440 

480 490 500 510 520 530 540 

AAC — GAAGCAGGGTTGCAAAACCTGTAGCTTGGGAAC— ATTTAATGACCAGAACGGTACTGGCGTCTGTC 

in i mi i nil n i nil n mini iiii ii 

AACCACACTCAGGCCTTTCTCTCCTGCTGCCT--C-AACTGGGGCAACAGCCTGCA-GATCCTGGACCAGGTT 
450 460 470 480 490 500 510 

550 560 570 580 590 600 

GACCCTGGACGAA— CTGCTCTCTAG ACGGAAGGTCT-GTG-CTTAAGACCGGGACCACGGAGAA 

II III II I II I III II I II III II II I II 1 II II II I 
GA-GCTGCGCGCAGGCTACCCTCCAGCCATACCCCACAACCTCTCCTGCCTCATGAACCTCACAAC-CAGCA 
520 530 540 550 560 570 580 


610 620 630 640 650 660 


G-GACGT-GGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCA-GTACCACCATTTCTGTGA 


! Ill 


670 

CTCCA 


Mil 


iiiii 


1 1 1 1 i i 


ii mi 


i i 



GCCTCATCTGCCAGTGGGAGCCAGGACCTGAG — ACCCACCTACCCACCAGCTTCACTCTGAAGAGTTTCAA 
590 600 610 620 630 640 650 

680 690 700 710 720 730 

GAG--GGAG-GAC — CAG GAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGC 

III II I II III till HIM I I II I II II I II I I II II 

GAGCCGGGGCAACTGTCAGACCCAAGGGGACTCCATCCTGGAC — TGC-GTGCCCAAGG-ACGGGCAGAGCC 
660 670 680 690 700 710 720 

740 750 760 770 780 790 

TTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCT— CTG TGCTCAAATGGAT-CAGGAAAAAAT 

linn ii i i ii ii mu iii ii ii iii i mi i i i 

ACTGCTGCATCCC ACGCAAACAC-CTGCTGTTGTACCAGAATATGGGCATCTGGGTGCAGGCAGAGAA 


730 

740 

750 

760 

770 

780 

800 810 

820 

830 

840 

850 

860 


TCCCCCACATATTCAAGCAACCATTTAAGAAGACCACTG-GAGCAGCTCAAGAGGAAGATG CTTGTAG 

III I 1 II III I II 1111 I I II III I II || || 
TGCGCTGGGGA — CCAG CATGT — CCCCACAACTGTGTCTTGATCCCATGGATGTTGTGAAACTGGAG 


790 

800 

810 

820 

830 

840 

850 

870 

880 

890 

900 

910 

920 

930 


CTGCCGATGTCCACAGGAAGAAGAA-- -GGAG-GAGGAGGAGG-CTATGAGCTGTGATGTACTATCCTAGGAG 

i ii m i i in ii i ii ii i ii i i i i i i ii mi n 

CCCCCCATG-CTGC-GGACCATGGACCCCAGCCCTGAAGCGGCCCCTCCCCAGGCAGG — CT-GCCTA-CAG 


860 

870 

880 

890 

900 

910 


940 

950 

960 

970 

980 

990 

1000 


ATGTG-TGGGCCGAAACCGAGAAGCA-CTAGGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCAC 

mi mi i n i in i in i n in n n in i mi i 

CTGTGCTGGG AGCCATG — GCAGCCAGGCCTGCA-CATAAATCAGA — AGTGTGAGCTGCGCCACAAG 

920 930 940 950 960 970 980 

1010 1020 1030 1040 1050 1060 1070 

CCTGTTCTTACACATCATCCTAGATG — ATGTGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAAC 

n ( i i i in ii n i mm i i n i i in nil i i i 

CCGCAGCGTGGAGA— AGCC-AGCTGGGCACTGGTGGGCCCCCTCC-CCTTGGAGGCCCTTC--AGTATGAG 
990 1000 1010 1020 1030 1040 

1080 1090 1100 1110 1120 1130 

ATATTTGTCTTTACCTT7TTTAAATCTTTTTTTAAATTTAAATTTTATGTGTGTG-AGTGTTTTGCCTG — 

I I 1 I III I I I I II II I I I llllll 

CTCTGCGGGCT — CCTCCCAGCCACGGCCTACACCCTGCAGATACGCTGCATCCGCTGGCCCCTGCCTGGCC 
1050 1060 1070 1080 1090 1100 1110 

1140 1150 1160 1170 1180 1190 1200 

CCTGTATGCACACGTGTGTGTGTGTGTGTGTGTGACACTCCTGA— TGCCTGAGGAGGTCAGA — AGAGAAA 

in i n i i i n n n in i n ii linn n i i 

ACTGGAGCGACTGGAGCCCCAGCCTGGAGCTGAGA-ACTACCGAACGGGCCCCCACTGTCAGACTGGACACA 
1120 1130 1140 1150 1160 1170 1180 

1210 1220 1230 1240 1250 1260 1270 

GGGTTGGTTCCATAAG-AACTGGAGTTATGGATGGCTGTG-AGCCGGNNNGATAG — GT CGGGACGG 

II III II I I I Hill III I III III I I II II l III I 

TGG-TGGCGGCAGAGGCAGCTGGACCCCAGGA — CAGTGCAGCTGTTCTGGAAGCCAGTGCCCCTGGAGGA 
1190 1200 1210 1220 1230 1240 1250 

1280 1290 1300 1310 1320 1330 

AGAC— CTGTC — TTCTTATTTTAACGTGACTGTATAATAAAAAAAAAATGATATTTCGGGAATTGTAGAGA 

mi i i i n i n i in n i i i n i i n in 

AGACAGCGGACG'GATCCAAGGTT-ATGTG — GTTTCTTGGA GACCCT-CAGGCCAGGCTGGGG 

1260 1270 1280 1290 1300 1310 

1340 1350 1360 1370 1380 1390 1400 

TTGTCCTGACACCCTTCTA-GTTAATGATCTAAGAGGAATTGTTGATACGTAGTATACTGTATATGTGTATG 

< 1 } I 1 I 1 II t 1 I 


! I M I f 


t I 


fit! 


fill 


! I I 



CCATCCTGCCCCTCTGCAACACCACAGAGCTCAGCTGCA — CCTTCCACCT-GCCTTCAGAAGCCCAGGAGG 
1320 1330 1340 1350 1360 1370 1380 

1410 1420 1430 1440 1450 1460 1470 

TATATGTATATGTATATATAAGACTCTTTTACTGTCAAAGTCAACCTAGAGTGTCT GGTTACCAG— G 

1 I I Nil II INI II II I I III! I I III I 
TGGCCCTTGTGGCCTATAACTCAGCCGGGACCTCTC-GCCCCACCCCGGTG'-GTCTTCTCAGAAAGCAGAGG 
1390 1400 1410 1420 1430 1440 1450 


1480 1490 1500 1510 1520 1530 1540 

TCAATTTTATTGGACATTTTACGTCACACACACACACACACACACACACACACGTTTATACT--ACGTACTG 


II I II I! I I I II I I II I till II I II II II 
CCCAGCT— CTGACCAGACTCCATGGCATGGCCCGAGACCCTCACAGCCTC-TGGGTAGGCTGGGAGCCCCC 
1460 1470 1480 1490 1500 1510 1520 


1550 1560 1570 1580 1590 1600 1610 

TTATCGGTATTCT-ACGTC-ATATAATGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATATTA T 

III I II I I I II I II II III II li II I III III | 
CAATCCATGGCCTCAGGGCTATGTGAT-TGAGTGGGGCCTGGGCCCCCCCAGCGCGAGCAATAGCAACAAGA 
1530 1540 1550 1560 1570 1580 1590 


1620 1630 1640 1650 1660 1670 

TGTGGAGG-TG— ACAGACT — ACCCCTTCTGGGTACGTAGGGACAGACCTCCTTCGG-ACTGTCTAAAAC 
llllll II HIM till III) I II || M II || if || 
CCTGGAGGATGGAACAGAATGGGAGAGCCACGGGGTTTCTGCTGAAGGAGAACATCAGGCCCTTTC — AGC 
1600 1610 1620 1630 1640 1650 1660 

1680 1690 1700 1710 1720 1730 1740 

TCCCCTTAGAAGTC-TCGTCAAGTTCCC--GGACGAAGAGGACA-GAGGAGACACAGTCCGA-AAAGT-TAT 

II INI II INI I III! I II llllll II III I III I I II III 

T-CTATGAGA — TCATCGTGA — CTCCCTTGTAC — CAGGACACCATGGGACCC-TCCCAGCATGTCTAT 
1670 1680 1690 1700 1710 1720 

1750 1760 1770 1780 1790 1800 

— TTTTC-CGGCAAA TCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTGGACACTTGAGTGTCATC 

I II I III III I II II I II II II II I III III I | 
GCCTACTCTCAAGAAATGGCTCCCTCCCATG-CCCCAGA-GCTGCA-TCTAAAGCACA-TTG — GCAAGA 
1730 1740 1750 1760 1770 1780 


1810 1820 1830 1840 1850 1860 1870 

CTTGCGC-CGGAAGGTCAGGTGGTACCCG-TCTGTAGGGGCGGGGAGACAGAGCCGCGGGGGAGCTACGAGA 

III II I I II II IN II I I I II III II llllll I I I II I I 
CCTGGGCACAGCTGG-AGTGGGTGCCTGAGCCCCCTGAGCTGGG-GA-AGAGCCCC— CTTACCCACTACA 
1790 1800 1810 1820 1830 1840 1850 

1880 1890 1900 1910 1920 1930 

— ATCGACT-CAC-AGGGCGC CCCGGGCTTCGCAAATGAAACTTTTTTAAT-CTCAC-AAGTTTCGTC 

III II II I I! I III INI I I II I INI I I II III 
CCATCTTCTGGACCAACGCTCAGAACCAGTCCTTCTC-CGCCATCCTGAATGCCTCCTCCCGTGGCTTTGTC 
1360 1870 1880 1890 1900 1910 1920 


1940 1950 1960 1970 1980 1990 2000 
CGGGCTCGGC-GGACCTATGGCGTC-GATCCTTAT-TACCTTAT-CCTGGC-GCCAAGATAAAACAACCAAA 

I I M III I IN I I II Nil II III I INI I I I lllll 
CTCCATGGCCTGGAGCCCGCCAGTCTGTATCACATCCACCTCATGGCTGCCAGCCAGGCTGGGGCCACCAAC 
1930 1940 1950 1960 1970 1980 1990 


2010 2020 2030 2040 2050 2060 2070 

AGCCTTGACTCCGGTACTAATTCTCCCTGCCGGCCCCCGTAAGCATA-ACGCGGCG-ATCTCCACTTTAAGA 

N I I III II lllll I II I I I I I I I II I II III II 

AG— TACAGTCC TCACCCTGATG — ACCTTGACCCCAGAGGGGTCGGAGCTACAC — ATCA 

2000 2010 2020 2030 2040 2050 


2080 2090 2100 2110 2120 2130 2140 

ACCTGGCCGCGTTCTGCCTGGTCTCGCT-TTCGTAAACGGTTCTTACAAAAGTAATTAGTTCTTGCTTTCAG 
mu i n n h i ii m 1 1 iiii i hi iiiii 


i 


ii in 


TCCTGGGCCTGTTCGGCC — TCCTGCTGTTGCTCACCTGCCTCTGTGGAACTGCCTGGCTCT — GTTGCAG 
2060 2070 2080 2090 2100 2110 2120 

2150 2160 2170 2180 2190 2200 

CCTCCAAGCTTCTG-CTAGTCTATGGC-- AGCATCAAGGCTGGTATTTGCT-ACGGCTGAC— CGCTAC-- G 

I! till I I II llll II II II III II II I I III I I 

CC-CCAACAGGAAGAATCCCCTCTGGCCAAGTGTC CCAGACCCAGCTCACAGCAGCCTGGGCTCCTGG 

2130 2140 2150 2160 2170 2180 

2210 2220 2230 2240 2250 2260 2270 

CCGC-CGCAATAAGGGTACTGGGCGGCCCGTCGAAGGCCCTTTGGTTTCA-GAAACCCAAGGCCCCCCTCAT 

n i nn i n i n ill i mi i n i inn n mi 

GTGCCCACAATCATGG-AGGAGGATGCCTTCCAGCTGCCCGGCCTTGGCACGCCACCCATCACCAAGCTCA- 


2190 

2200 

2210 

2220 

2230 

2240 

2250 

2280 

2290 

2300 

2310 

2320 

2330 

2340 


ACCAACGTT — TCGACTTTG-ATTCTTGCCGGTACGTGGTGGTGGGTGCCTTAGCTCTTTCTCGATAGTTAG 

n i i n it i nun i in i n n n i i n ii 

--CAGTGCTGGAGGAGGATGAAAAGAAGCCGGT--GCCCTGG-GAGTCCCATAACAGCTCAGAGACCTGTGG 
2260 2270 2280 2290 2300 2310 2320 

X 

AC 

I 

CCTCCCCACTCTGGTCCAGACCTATGTGCTCCAGGGGGACCCAAGAGCAGTT 
2330 2340 2350 2360 2370 


6. ELLIS-012-FIG2AB.SEQ (1-2350) 

N6I379 Sequence encoding porcine beta-follicle stipiulatin 

ID M61379 standard; cDNA; 728 BP. 

AC N61379; 

DT 03-AUG-1992 (first entry) 

DE Sequence encoding porcine beta-follicle stipulating horpione (FSH). 
KW Superovulation therapy; hypophyseal disorder; gonadal regression; 
KW infertility; ss. 

OS Pig. 

FH Key Location/Qualifiers 

FT CDS 1..54 

FT /#tag= a 

FT transit_peptide 55.. 108 
FT /#tag= b 

FT piat_peptide 109.. 444 

FT /#tag= c 

FT CDS 445.. 726 

FT /#tag= d 

PN FR2565599-A. 

PD 1 3-DEC-l 985 . 

PF 07-JUN-1985; 508647. 

PR 08— JUN— 1984 i US-618466. 

PA ( INTE-) INTEGRATED GENETICS. 

PI Beck AK; 

DR HPIf 86-030537/05. 

DR P-PSDB! P61785. 

PT Neu DNA coding for porcine beta-follicle stipulating horpione - 
PT useful for raising antibodiesi inducing ovulation etc.r and nets 

PT expression vectors 

PS Disclosure; Page 3; 14pp; French. 

CC Total RNA is extracted fropi pig hypophyseal glands and used to 

CC construct a library of cDNA. The library was screened using tuo 

CC oligonucleotide probes designated PF55 and PF434. These uere 
CC ligated to give the complete sequence including the untranslated 
CC flanking regions. This sequence has been inserted into pBR322 and 
CC deposited as NRRL B-15793. The final vector is ppFSH. 



Initial Score = 
Residue Identity = 
Gaps = 


137 Optimized Score = 321 Significance = 7.13 
492 Hatches = 389 Mismatches = 303 
94 Conservative Substitutions = 0 


10 20 30 40 50 60 70 
TGAACTGCTGAGTGGATAAACAGCACGGGATATCTCTGTCTAAAGGAATATTAC-TACACCAGGAAAAGGAC 

III I III II I II 
GTACTTTCAC— GGTCTCGTAC 
X 10 20 


80 90 100 110 120 130 140 

AC — ATTCGACAACAGGAAAGGAGCCTGTCACAGA — AAACCACAG-TGTCCTG-TGCATGTGACATTTCGC 

II II II I I II I III II I II II II I I I I II I 
ACCAGCTCCTTAATTGTTTGGTTTCCACCCCAAGATGAAGTCGCTGCAGTTTTGCTTCCTATTCTGTTGC — 
30 40 50 60 70 80 90 

150 160 170 180 190 200 210 

CATGGGAA-ACAACTGTTACAACGTGGTGGTCA-TTGTGCTGCTGCTAGTGGGCTGT-GAGAAGGTGGGAGC 
III II II III Mil I I II I II I I I I II Hill I 1111 
--TGGAAAGCCATCTGCTGCAA— TAGCTGTGAGCTGACCAACATCACCATCACAGTGGAGAAAG-AGGAG- 
100 110 120 130 140 150 


220 230 240 250 260 270 280 

CGTGCAGAACTCCTGTGATAACTGTCA— GCCTGGTACT-TTCTGCAGAAAATACAATCCAGTCTGCAAGAG 

II till Ill llll III II II I I III I I I I I II | | || 

— TG— TAAC7TCTG-CATAAGCATCAACACCACGTGGTGTGCTG-- GCTATTGC — TACACCCGG — GAC 
160 170 ISO 190 200 210 

290 300 310 320 330 340 

CTGCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCC-GAA — CTGTAACATC-TGCAGAGTGTGTGCA 

m i iiii ii mu i m ii m mi i ii i ii mi i 

CTGGTATACAAGGAC CCAGCCAGGCCCAACATCCAGAAAACATGTACCTTCAAGGAGCTGGTGTACG 

220 230 240 250 260 270 280 

350 360 370 380 390 400 410 420 

GGCTATTTCAGGTTCAA.GAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTCCAT 

i i i ii i i i i nil i i i n n i n i n n n i in 

AGACCGTGAAAGTACCTG--GCTGTGCT-CACCATGCA-GACTCCCTGTATACGT-- -AT-CCAGTA-GCCAC 
290 300 310 320 330 340 

430 440 450 460 470 480 490 

TGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCA-GGCCTG-GCCAGGAGCTAAC-GAAGCAGGGT 

II I III III I II Hill 1 III I I 1 ill II III I I II I 

CGAAT — GTCACTGTG-GCAAG-TGTGACAGTGACAGTACTGACTGCACCGTGAGAGGCCTGGGGCCCAGC 
350 360 370 380 390 400 410 

500 510 520 530 540 550 

TGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTAC-TGGCGT-CTGTCGACCCTGGAC-GAAC 

II HI II II I I III I II I I II I I I II I Hill I I ill 
TACTGCTCCTTCAG — TGAAATGAAAGAATAAAGAGCAGTGGACATTTCATGCTTCCTACCCTTGTCTGAAG 
420 430 440 450 460 470 480 

560 570 580 590 600 610 620 

TGCTCTCTAGACG — GAAG — GTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGGACCCCC 

i i mu m ii nil n i i m i i i n i ii i nil 

GAC — CAAGACGTCCAAGAAGTTTGTGTGTA — CATGTGCCCAGGCTGCA-AAC-CACTATGAGAGACCCC 
490 500 510 520 530 540 

630 640 650 660 670 680 690 

TGTGGTGAGCTTCTCTCC-CAGTACCACCATTTCTGTGACTCCAGAGGG--AGGACCAGGAGGGCA-CTCCT 

II I II II III llll II II I Hill 1 III II II 1 II 
ACTGAT-CCCTGCTGTCCTGTGGAGGAGGAGCTCCAGGAATGCAGAGTGCTAGGGCCTCAGTCCCATCACCA 
550 560 570 580 590 600 610 620 



TGCAGGT CCTTACCTTGTTCCTGG'CGCTGACAT CGGCTTTGCTGCTGGCCCTGATCTT — CATTACTCTCCT 

ii i ii 111 mi i in i in ini i in mini i 

CTCAACCCTGTATTTTGGGTCTGG — TTCCATAAG-TTTTATTCGGTCTTTTTTTTTTAAATTACTC-AAT 
630 640 650 660 670 680 

770 780 790 800 810 820 830 

G— TTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCACT 

i ii ii i i ii i i ii ii i mi 

GAATTTTAT-TACATTTATAATTGTAGCAAGGAT— CATCACAA 
690 700 710 720 X 

840 850 

GGAGCAGCT CAAGAGGAAGATG 


7. ELLIS-012-F 1G2AB . SEQ (1-2350) 

N60741 Sequence of porcine beta-follicle stimulating horn 

ID N60741 standard; cDNA; 728 BP. 

AC N60741 ! 

DT 28-FEB-1992 (first entry) 

DE Sequence of porcine beta-follicle stimulating hormone (FSH) cDNA. 

KW Hypophyseal; disorder; tumour; superovulation; infertility; therapy; 

KU diagnosis; gonadal regression; ss. 

OS Sus scrofa. 

FH Key Location/Qualifiers 

FT nRNA 1..54 

FT /Hag= a 

FT transit_peptide 55.. 114 
FT /*tag= b 

FT mat_peptide 115.. 444 

FT /Hag= c 

FT nRNA 445.. 726 

FT /*tag= d 

PN FR2565599-A. 

PD 13-DEC-19S5. 

PF 07-JUN— 1 985 ; 508647. 

PR 08-JUN-l 984 ! US-618466. 

PR 20-0CT-1986; US-921867. 

PA (INTE-) INTEGRATED GENETICS. 

PI Beck AK; 

DR WPI; 86-030537/05. 

DR P-PSDB! P60821, 

PT Neu DNA coding for porcine beta-follicle stimulating hormone - 
PT useful for raising antibodies, inducing ovulation etc., and neu 
PT expression vectors 

PS Disclosure; Page 3; 14pp; French. 

CC Total RNA extracted from pig hypophyseal gland uas used to construct 
CC a library of cDNA. The library uas screened using tuo 
CC oligonucleotide probes (N60742. N60743) . Tuo sequences, designated 
CC PF55 and PF434 mere isolated. These uere ligated to give the 
CC complete sequence for beta-FSH including the untranslated flanking 

CC regions. This sequence has been inserted into pBR322 and deposited 
CC as NRRL B-15793. 

SQ Sequence 728 BP; 186 A; 184 C; 168 G; 190 T; 

Initial Score = 137 Optimized Score = 322 Significance = 7.13 

Residue Identity = 49% Hatches = 390 Mismatches = 302 

Gaps = 94 Conservative Substitutions = 0 

10 20 30 40 50 60 70 

TGAACTGCTGAGTGGAT AAACAGCACGGGATAT CTCTGTCTAAAGGAATATTAC-TACACCAGGAAAAGGAC 

III I III II I II 

GTACTTTCAC— GGTCTCGTAC 
X 10 20 



80 90 100 110 120 130 140 

AC — ATTCGACAACAGGAAAGGAGCCTGTCACAGA — AAACCACAG-TGTCCTG-TGCATGTGACATTTCGC 

I! II II I I II I III II I I I II II I I I I II I 
ACCAGCTCCTTAATTGTTTGGTTTCCACCCCAAGATGAAGTCGCTGCAGTTTTGCTTCCTATTCTGTTGC — 
30 40 50 60 70 80 90 

150 160 170 180 190 200 210 

CATGGGAA-ACAACTGTTACAACGTGGTGGTCA-TTGTGCTGCTGCTAGTGGGCTGT-GAGAAGGTGGGAGC 

iii ii ii iii i iii i i ii i ii i i i i ii mu i mi 

— TGGAAAGCCATCTGCTGCAA — TAGCTGTGAGCTGACCAACATCACCATCACAGTGGAGAAAG-AGGAG- 
100 110 120 130 140 150 

220 230 240 250 260 270 280 

CGTGCAGAACTCCTGTGATAACTGTCA— GCCTGGTACT-TTCTGCAGAAAATACAATCCAGTCTGCAAGAG 
II till III 1111 III II II I I III 1 1 I I I II I I II 

--TG-TAACTTCTG-CATAAGCATCAACACCACGTGGTGTGCTG— GCTATTGC--TACACCCGG — GAC 
160 170 180 190 200 210 

290 300 310 320 330 340 

CTGCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCC-GAA— CTGTAACATC-TGCAGAGTGTGTGCA 

in i mi n inn i in n in mi i n i n mi 1 

CTGGTATACAAGGAC CCAGCCAGGCCCAACATCCAGAAAACATGTACCTTCAAGGAGCTGGTGTACG 

220 230 240 250 260 270 280 

350 360 370 380 390 400 410 420 

GGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTCCAT 

n i i n i i i i mi ii i n n i n i n n iii in 

GGACCGTGAAAGTACCTG — GCTGTGCT-CACCATGCA-GACTCCCTGTATACGT — AT-CCAGTA-GCCAC 
290 300 310 320 330 340 

430 440 450 460 470 480 490 

TGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCA-GGCCTG-GCCAGGAGCTAAC-GAAGCAGGGT 
II I III III I II Hill I III II I III II III I I II I 

CGAAT — GTCACTGTG-GCAAG-TGTGACAGTGACAGTACTGACTGCACCGTGAGAGGCCTGGGGCCCAGC 
350 360 370 380 390 400 410 

500 510 520 530 540 550 

TGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTAC-TGGCGT-CTGTCGACCCTGGAC-GAAC 

II ill II II I I III I II I I II I I I II I Hill I I III 
TACTGCTCCTTCAG— TGAAATGAAAGAATAAAGAGCAGTGGACATTTCATGCTTCCTACCCTTGTCTGAAG 
420 430 440 450 460 470 480 

560 570 580 590 600 610 620 

TGCTCTCTAGACG — GAAG — GTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGGACCCCC 

i i inn in n mi n i i in i i i n ini mi 

GAC — CAAGACGTCCAAGAAGTTTGTGTGTA— CATGTGCCCAGGCTGCA-AAC-CACTATGAGAGACCCC 
490 500 510 520 530 540 

630 640 650 660 670 680 690 

TGTGGTGAGCTTCTCTCC-CAGTACCACCATTTCTGTGACTCCAGAGGG — AGGACCAGGAGGGCA-CTCCT 

ii i n n in i i i i ii n i inn i in n n i n 

ACTGAT-CCCTGCTGTCCTGTGGAGGAGGAGCTCCAGGAATGCAGAGTGCTAGGGCCTCAGTCCCATCACCA 
550 560 570 580 590 600 610 620 

700 710 720 730 740 750 760 

TGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTGCTGCTGGCCCTGATCTT— CATTACTCTCCT 

ii i n in mi i in i in i i i i i iii mini I 

CTCAACCCTGTATTTTGGGTCTGG — TTCCATAAG-TTTTATTCGGTCTTTTTTTTTTAAATTACTC-AAT 
630 640 650 660 670 680 

770 780 790 800 810 820 830 

G — TTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCACT 

i n i i i i n i i n ii i mi 

GAATTTTAT-TACATTTATAATTGTAGCAAGGAT — CATCACAA 
690 700 710 720 l 



840 850 

GGAGCAGCT CAAGAGGAAGATG 


8. ELLIS-Q12-FIG2AB.SE8 (1-2350) 

Q03847 Porcine beta FSH subunit. 


ID 803847 standard; cDNA; 780 BP. 

AC Q03847; 

DT 24-AUG-1990 (first entry) 

DE Porcine beta FSH subunit. 

KW Luteinizing hormone; follicle stimulating hornone! 

KH recombinant cDNA; alpha subunit; beta subunit; ungulate; ss. 

OS Bos taurus. 

FH Key Location/Qualifiers 

FT CDS 107, .498 

FT /»tag= a 

FT /product=Porcine beta FSH 

PM WQ9002757-A. 

PD 22-MAR-1990. 

PF 02-SEP-1988; 030949. 

PR 02-SEP-1988; W0-U03049. 

PA (INTE-) Integrated genetics. 

PI Beck A, Bernstine E, Hsiung Ni Kelton C> Lerner T* Reddy VB; Chappel SC. 
DR UP I! 90-115954/15. 

PT Biologically active ungulate LH and FSH- produced by recombinant methods. 

PS Disclosure; Fig 10; 66ppJ English. 

CC LH and FSH comprises an alpha and a beta subunit* both subunits can be 

CC synthesised in a single cell contg. an expression vector comprising 

CC heterologous DNA encoding one subunit. 

CC See also Q03843-Q03851 . 

SO Sequence 780 BP; 201 A; 195 C; 184 G; 200 T; 


Initial Score = 
Residue Identity = 
Gaps = 


137 Optimized Score = 339 Significance = 7.13 
487. Hatches = 405 Mismatches = 337 
93 Conservative Substitutions = 0 


10 20 30 40 50 60 

ATGTCCATGAACTGCTGAGTGGATAAACAGCACGGGATATCT CTGTCTAAAGGAATATTAC-TACAC 

11 in mm i ii it iii ii i ii i i iii i iii 

GAGTGGCTACCTGGATACGTA-TACAGGGAGTCTGCATGGTGAGCACAGCCA-AGTACTTTCAC 
X 10 20 30 40 50 60 


70 80 90 100 110 120 130 

CAGGAAAAGGACAC — ATTCGACAACAGGAAAGGAGCCTGTCACAGA— AAACCACAG-TGTCCTG-TGCAT 

II I III! II II I I II I III II II I II II I I I 
— GGTCTCGTACACCAGCTCCTTAATTGTTTGGTTTCCACCCCAAGATGAAGTCGCTGCAGTTTTGCTTCCT 


70 

80 

90 

100 

110 

120 

130 

140 

150 

160 

170 

180 

190 

200 


GTGACATTTCGCCATGGGAA-ACAACTGTTACAACGTGGTGGTCA-TTGTGCTGCTGCTAGTGGGCTGT-GA 

I II I III II II III I III I I II I II I I I I II II 

ATTCTGTTGC TGGAAAGCCATCTGCTGCAA— TAGCTGTGAGCTGACCAACATCACCATCACAGTGGA 

140 150 160 170 180 190 


210 220 230 240 250 260 270 

GAAGGTGGGAGCCGTGCAGAACTCCTGTGATAACTGTCA — GCCTGGTACT-TTCTGCAGAAAATACAATCC 

iii i mi ii mi m mi in n n i i in i i i i i i 

GAAAG-AGGAG — TG — TAACTTCTG-CATAAGCATCAACACCACGTGGTGTGCTG — GCTATTGC— TAC 
200 210 220 230 240 250 


280 290 300 310 320 330 

AGTCTGCAAGAGCTGCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCC-GAA — CTGTAACATC-TGC 

iii n in i mi n inn i in n in mi i n i 

ACCCGG — GACCTGGTATACAAGGAC CCAGCCAGGCCCAACATCCAGAAAACATGTACCTTCAAGG 


340 350 360 370 380 390 400 

AGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATT 

II INI II I I III I I I till I I I II II I II I II 

AGCTGGTGTACGAGACCGTGAAAGTACCTG— GCTGTGCT-CACCATGCA-GACTCCCTGTATACGTATCCA 
330 340 350 360 370 380 390 

410 420 430 440 450 460 470 

GAAGGATTCCATTG-CTTGGGGCCACAGTGCACCAGATGTGA-AAAGGACTGCAGGCCTGGCCAGGAGCTAA 

i ii i i ii i i i ii mi in n i i 1 1 1 1 1 1 1 n i in n 

GTAGCCACCGAATGTCACTGTGGCA-AGTGTGACAG-TGACAGTACTGACTGCA-- -CCGTGAGAGG-CCT-- 
400 410 420 430 440 450 

480 490 500 510 520 530 540 

CGAAGCAGGGTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTAC-TGGCGT-CTGTCGACC 

I II III III II II I I III I II I I II I I I II 1 III 

-GGGGCCCAGCTACTGCTCCTTCAG— TGAAATGAAAGAATAAAGAGCAGTGGACATTTCATGCTTCCTACC 
460 470 480 490 500 510 520 

550 560 570 580 590 600 610 

CTGGAC-GAACTGCTCTCTAGACG— GAAG — GTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTG 

n i ini i i inn in n nil n i i in i i i i n 

CTTGTCTGAAGGAC — CAAGACGTCCAAGAAGTTTGTGTGTA— CATGTGCCCAGGCTGCAAACCACTATG 
530 540 550 560 570 580 590 

620 630 640 650 660 670 680 

TGTGGACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTTCTGTGACTCCAGAGGG--AGGACCAGGAG 

I I III III II I II I I I I I II II I Hill I III II 

AGAGACCCCACTGATCCCTGC-TGTC-CTGTGGAGGAGGAGCTCCAGGAATGCAGAGTGCTAGGGCCTCAGT 
600 610 620 630 640 650 660 

690 700 710 720 730 740 750 

GGCA-CTCCTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTGCT-GCTGGCCCTGATCTTCAT 

n i n n i n in mi i in i in i i i i i i n 

CCCATCACCACTCAACCCTGTATTTTGGGTCTGG — TTCCATAAGTTTTATTCGGTCTTTTTTTTTTAAAT 
670 680 690 700 710 720 730 

760 770 780 790 800 X 810 820 

TACTCTCCTG — TTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAG 

inn n n i i i i n i i n n i n 

TACTC-AATGAATTTTAT-TACATTTATAATTGTAGCAAGGATCATCACAA 
740 750 760 770 780 


830 840 850 

AAGACCACTGGAGCAGCTCAAGAGGAAGA 


9. ELLIS-012-FIG2AB.SEQ (1-2350) 

028758 Partial sequence of tunour suppressor gene U10. 

ID 028758 standard; DNA ; 4328 BP. 

AC 023753; 

DT 25-FEB-1993 (first entry) 

DE Partial sequence of tunour suppressor gene U10. 

KW CaN19; tunour suppressor gene; cancer; therapy; ss. 

OS Hono sapiens, 

PN W09215602-A. 

PD 1 7— SEP- 1992 . 

PF 28-FEB-1992; U01624. 

PR 28-FEB-1991I US-662216. 

PA (DAND ) DANA FABER CANCER INST INC. 

PI Sager R 

DR WPl; 92-331663/40. 

PT Diagnoses and treatnent of cancer - using candidate tunor suppressor 
PT genes or the corresp. antibodies. 


CC An adaptation of the subtractive hybridization technique was used 
CC which utilizes a biotinylation-based subtraction procedure instead 
CC of hydroxyapatite as previously used. In this procedure. a single 

CC strand phagemid cDNA library from nornal cell polyA+ mRNA is 

CC hybridized with excess biotinylated tunor polyA+ mRNA. and the 

CC resulting double stranded sequences are removed by binding to 

CC streptavidin. The remaining single-stranded phagemid cDNAs are 
CC converted to double-stranded form and used to transform bacterial 

CC host cells. The resulting subtracted cDNA library is differentially 

CC screened with total cDNA from normal and tumor cells. This method 
CC produced some 20 additional cloned cDNAs. Also found by this 
CC method were several genes which, on the basis of the partial DNA 
CC sequences appear to be novel sequences not previously entered 
CC into GENBANK. The portion of the cDNAs so sequenced represents 
CC part of the coding region and/or part of the 3’ untranslated region 
CC of each cDNA (see 028749-58). 


S0 Sequence 

4328 BP! 

1236 A! 970 c; 

912 

G; 1210 t; 

Initial Score 

= 135 

Optimized Score = 

953 

Significance = 7.01 

Residue Identity 

= 467, 

Hatches = 

1140 

Mismatches = 1017 

Gaps 

= 279 

Conservative Substitutions 

= 0 





X 10 20 


ATGTCCATGAACTGCTGAGTGG 
I 1 til 1 1 1 i 1 1 I II 

CAGTTATGTTCCTGTTTCGTTATTGGTACCAAAACTCTTGCCAGATAACCAGTTTCATGAACTGTT — TGT 
1990 2000 2010 2020 2030 2040 2050 


30 40 50 60 70 80 90 

ATAAACAGCACGGGATATCTCTGTCTAAAGGAATATTACTACACCAGGAAAAGGACACATTCGAC— AACAG 

II INI I II III III I II II I I I III II II II I III 
AT-GGCAGCCCATGTTCTCTAATGCCACTGCTCTGTT-TTA-AAAACTCAGAGG-CAATTTTTACATATCAG 
2060 2070 2080 2090 2100 2110 2120 

100 110 120 130 140 150 

GAAAGGAGCCTGTCACAGAAAACCACAG-TGTCCTG-TGCATGTGACATTTCGCCATGGGAAACAACTG — 

III I I I I I II I I II II III II II II I II mu 

TAATTG — TTTTTATA-ATTTGCATGGTTTTCATGAAACAT-TGCTATGCATTTATTAGGAAAAACTGAAT 
2130 2140 2150 2160 2170 2180 2190 


160 170 180 190 200 210 220 

TTACAACGTGGTGGTC — ATTGTGCTGCT-GCTAGTGGGCTGTGAGAAGGTGGGAGCCGTGCAGAACTCCT 

it i i mi i i ii i i m i i i i i i ii n m n i 

TTCCCAACAGGTGAACTGAAAAGTTATTTTAACTATTATAC-ATAATCA-GAAAGATCC-TGC — CTCTACG 
2200 2210 2220 2230 2240 2250 

230 240 250 260 270 280 290 

GTGATAACTGTCAGCCTGGTACTTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGCCCTCCAAGTACC- 

i ii i i i ill ill mi n in i in i nil in i n 

GAATTAGC — TAAACCTAAAAATGTTTGCATTAA — TGAATAAATTCTTC CTGCATTCCTTGGCCCA 

2260 2270 2280 2290 2300 2310 2320 

300 310 320 330 340 350 360 

-TTCTCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAA 

mi n i nil i n i i n i n i i in i i i i in nil 

GTTCTGGAG— TTGGTGACCTTTATCACAATTATAT-TTTAG — GCGGCCAGTGAACTGCTGCTTC-AGAA 
2330 2340 2350 2360 2370 2380 


370 380 390 400 410 420 430 

GT — TTTGCTC — CTCT-ACCCACAACGCGGAGTG — TGAGTGCA— TTGAAGGATTC-CATTGCTTGGGG 

n i n i mi ii n i n i in n n n n i nn in i 

GTCCATAGCCCAGCTCTGAACTTTCTCGATAAATGCCATCAGTTCACCTTTAAAGACACACATTCCTTTG— 
2390 2400 2410 2420 2430 2440 2450 

440 450 460 470 480 490 

rririr:Tfvr ArrirATr-TrAA a *r ■ ArTr/’ArrfrT/'f/'/'Arf'*r^TA ArrAf'/'r ttaas a aaat 


I I I Mil ill II I III I III I II INI I III I II 

--AAA-TCCACCCAGTGTTTAA — AAAGCA-ACTTGGAAATTTAC-ACATTAGCATTGTACTTTCTAGCCC 


2460 2470 

2480 

2490 

2500 

2510 

500 510 520 

530 

540 

550 

560 


TGTAGCTTGGGAACATTTAATGACCAGAACGGTACTG-GCGTCTGTCGACCCTG-GACGAAC-TGCTCTCTA 

II III II II till I III II III I llll 1111 I II 
— TAATTTGTGAGGTTGCAGCTATCATTA-TATTCTGCATGTATGTATAACCTGTTGTGAACAATCATACTT 


2520 

2530 

2540 

2550 

2560 

2570 

2580 

570 

580 

590 

600 

610 

620 

630 


GACGGAAGGTCTG-TG-CTTAAGACCGGGACCACGGAGAAGGAC-GTGGTGTGTGGACCCCCTGTGGTGAGC 

II II HI II III III II II II II II I II II III I 

AACAAAACTACTGATGGTTTATGAC — AACGTAGGGTAACTACAGTTCATTCTGTTCC AGGTTATA 

2590 2600 2610 2620 2630 2640 2650 

640 650 660 670 680 690 700 

TTCTCTCCCAGTACCACCATTTCTGTGACTCCAGAGG — GAGGACCAGGAGGGCACTCCTTGCAGGTCCTTA 

i n i n mi i i i in n i n n n n i i n 

TAAAACTGCATTTCCTGAATTTGGTTAAAAACTAAGGATGATGGATTCGAAAACAGTCTTTTAAATTAGTTT 
2660 2670 2680 2690 2700 2710 2720 

710 720 730 740 750 760 770 

CCTTGTTCCTGGCG — CTGACATCGGCTTTGCTGCTGGCCCTGA-TC-TTCATT ACTCTCCTGTTCTCTGT— 

n i n i n n n n n inn n n i i nn i i n 

ATATGCTTTAGGTGTTTTGGAATTTGCCTTCTTGAACTTCCTGAGTCACACAGAAAGCAACTGTACACAGTA 
2730 2740 2750 2760 2770 2780 2790 

780 790 800 810 820 830 840 

-GCTCAAATGGATCAGGAAA AAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCACTG-GAGC 

i in in i in i n n i n i n mini i 

GAATTCTGTGGCGCAGACCATGCTGTATTAACACATCACTTGCTGTTTCCTACTGAGTGTACCACTGCCTTC 


2800 

2810 

2820 

2830 

2840 2850 2860 


850 

860 

870 

880 890 900 


AGCTCAAG — AGGAAGATGCTTGTAGCTGCCGA-TGTC — CACAGGAAGAAGA-AGGAGGAGGAGGAGGC 

n n nil in n mi nn i i n i i i nn i i 

CCTTCTAGCCCAGGAGAATG-TTTACTCAGTTTAGTGTCTTGTATTTCTATAATACACCAACAGGA — ATGG 
2870 2880 2890 2900 2910 2920 2930 • 

910 920 930 940 950 960 970 

TA-TGA-GCTGTGATGTACT — ATCCTAGGAGATGTGTGGGCCGAAACCGAGAA-GCACTAGGACCCCA-CC 

n i i nn in i i in n in n i nn i n i i 

TAGTCACACTGTCTTGAAATTGAATCT-GTCCATCTGT — TTATAATCAAGAACATATCAGAAATATATAG 
2940 2950 2960 2970 2980 2990 3000 

980 990 1000 1010 1020 1030 

ATCCTGTGGAACA-GCACAAGCAACCC-CACCACCCTGTT CTTACACAT CATCCT — AGATGAT — G 

III I II I 1 III II III 1 Hill 1 II Hill II II I I I 

GTCCCAGGTAATACTCCCAAACATCCCACTTTTTACTGTTTCAGGCCATCATATCATTCTTAAGCTACTTGG 
3010 3020 3030 3040 3050 3060 3070 

1040 1050 1060 1070 1080 1090 1100 

TGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACATATTTGTCTTTACCTTTTTTAAATCTTTTTT 

nn i i nn n m i n i inn hi i n in 1 

GGTGGTAGTAGAGGATTAGGTTGTCTATTATAAAACCAAAA CTCATT-CGTTTAATGAA-CTTGACT 

3080 3090 3100 3110 3120 3130 3140 

1110 1120 1130 1140 1150 1160 1170 

TAAATTTAAATTTTATGTGTGTGAGTGTTTTGCCTGCCT-GTATGCACACGTGTGTGTGTGTGTGTGTGTGA 

11 I I III 1 III llll II II II 1 II 1 1 I 1 I II 

GTCAT — ACCTCTAT TTAGT-AATTGCGAGGGTAAGATTCATA-GTAGGAATATTGGAAATTTTGG 

3150 3160 3170 3180 3190 3200 

1180 1190 1200 1210 1220 1230 1240 

rArTrrTfVATcrrTr.ACPArcT,-Ai'4&nir'4AAr."’r'TTrr — TTrr at a at a A/'TrrArTT lt f* r at re n p 


CACT-CTGA-GAATAAATAGG-CATATGATACCCACTTGGACTTTTAACAAAAGTAAAGGAATAAAT — TTG 
3210 3220 3230 3240 3250 3260 3270 

1250 1260 1270 1280 1290 1300 1310 

TGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAACGTGACTGTATAATAAAAAAAAAAT 

II I I II il I I Mil II III llll II I I I II 

CATATAGGTTTGGAAAGTGAGGCAGCAATGCTGTTAACTGCATTTGTTGTGATGGTGCATTTGATTGAAGCA 
3280 3290 3300 3310 3320 3330 3340 

1320 1330 1340 1350 1360 1370 1380 

GATATTTCGGGAAT-TGTAGAGATTGTCCTGACACCGTTCTAGTT AATGATCTAAGAGGAATTGTTGA 

i i i ii i i ii i iii iii ii ii i ii i ii ii i mu i 

GCT-TGTCTTTATTATGCA-AGACTGTGTAGAGTTTTTTTTTTTTTGGCATTGTACTTTTTGTTTTTGTT-A 
3350 3360 3370 3380 3390 3400 3410 

1390 1400 1410 1420 1430 1440 1450 

TACGTAGTATACTGTATATGTGTATG-TATATGTATATGTATATATAAGACTCTTTTACTGTCAAAG-TCAA 

ii i i i i ii i iii i iiii mm i ii iii ii mm n 

TAAGGAAGACAGAACAAACTGGAATGTTTTATGATGTTGTATAGCAATCGCTTTTTACCTTTCAAAGTTCCG 


3420 

3430 

3440 

3450 

3460 

3470 

3480 

1460 

1470 

1480 

1490 

1500 

1510 

1520 


CCT-AGAGTGTCTGGT7ACCAGGTCAATTTTATTGGACATTTTACGTCACACACACACACACACACACACAC 

I I I III llll I I I III III I II I I I III II 

GGTAAAAATGT — GTTA-TATCTGTAGTTTTTTGTTTTTGTTTTTTTTTA-AAGCAC TAC 

3490 3500 3510 3520 3530 3540 

1530 1540 1550 1560 1570 1580 1590 

ACACGTTTATACTACGTACTGTTATCGGTATTCTA — CGTCATATAA — TGGGATAGGGTAAAAGGAAACC 

I llll llll I I I II II II I I llll III I III I I III III I 

ATCTGTTTTCACTAATTGTTAATTTCTGT-TTGAACCCTTCATTTAATTTTCTCATA-GATTTAAGTAAA-C 
3550 3560 3570 3580 3590 3600 

1600 1610 1620 1630 1640 1650 

AAAGA-GT-GAGTG-ATATTATTGTGGAGGTGAC-AGACTACC-- CCTTCTGGGTACGTA-GGGACAGACCT 

INI II II I I I I III I II I II III I III I || | 

AAGGATGTATTTTGCACACGCTCGCACTTATGTCTATTTTAACAATCTCCTGCATCTGTATTTTATAGTCAG 
3610 3620 3630 3640 3650 3660 3670 3680 

1660 1670 1680 1690 1700 1710 1720 

CCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGAAGA-GGACAGAGGAGACACA 

iiii in ii i I ii iii i mu n i i m n n 

CCTTTTGACCACCTGGTGCCAGCTATATAAG— GAATAAAGTT GATTCATATCAACATTAGAACTCCA 

3690 3700 3710 3720 3730 3740 

1730 1740 1750 1760 1770 1780 

GTCCGAAA — AGT-TAT TTTTCCGGCAAATCCTTTCC CTGTTTC — GT-GACACTCCACC 

llll III I I I I II I II I II 1 II I llll II III 1 

GTCCCAAACTAATCTGTCAGGTTCACTGGTACATAAATACCTAGGAAATATTTTTCCAGTCTACATTTGGTG 
3750 3760 3770 3780 3790 3800 3810 

1790 1800 1810 1820 1830 1840 1850 

CCTTGTGGA-CACTTGAGTGTCATCCTTGCGCCGGA-AGGTCAGGT-GGTACCCGTCTGTAGGGGCGGGGAG 

I llll I I I I II II I I III I I I I llll II II 

CTATGTGCAGTAACTAATAGTACTCTTACCAGAGGAGAAATTATATAACGACCCTGCTAATATCTTTCTTAG 
3820 3830 3840 3850 3860 3870 3880 3890 

1860 1870 1880 1890 1900 1910 1920 

ACAGAGCCGCGGGGGAGCTACGAGAATCGACTCACAGGGCGCCCCGGGCTTCGCAAATGAAACTTTTTTAAT 

I II I II I II I III I II I I II I II I II III II 

TTATTTGCTCCTTCAAATTA-AAAAAGCAACTAAGAQAAAG — AAAAACATTGT AGAT — ATCTATTT — AT 


3900 

3910 

3920 

3930 

3940 

3950 

1930 

1940 

1950 

1960 

1970 

1980 


PTC APAACTT — rrr-Tr^^r-rrTrrrrrrArrTATr^rrTr'rAyrrrr at t Arr^jr hjrrrrrrrrrA 


ATTTAAAGTTTATGAAACATGAACTGCAG-CTGCAGGATTCTGGCATTTTGCATGCCATTCTCCATCAGATC 
3960 3970 3980 3990 4000 4010 4020 

1990 2000 2010 2020 2030 2040 2050 2060 

AGATAAAACAACCAAAAGCCTTGACTCCGGTACTAATTCTCCCTGCCGGCCCCCGTAAGCATAACGCGGCGA 

III I II I I III I lllll I III I I II III I I II 

TGGGATGATGGCTCAGAACATGTACACAG — ACTAAGAGTAACTG-TGTGATCTGT TAAGGGGTGGA 

4030 4040 4050 4060 4070 4080 4090 

2070 2080 2090 2100 2110 2120 

TCTCCACTTTAAGAACCT— GGCCGC-GTTCTGCCTGGTCTCGCTTTCGTAAACGGTTCTTACAAAAGTAAT 

I II II I Ml II II II II III I I II llll II I III 

T-AACATAATATGCAGCTTAGGATGCTATTTTGAGATGTAT-GAT ATCAGTTCATTC — ACCTGAT 

4100 4110 4120 4130 4140 4150 

2130 2140 2150 2160 2170 2180 2190 

TAGTTCTTGCTTTCAGCCTCCAAGCTTCTGCTAGTCTATGGCAG-CATCAAGGCTGGT-ATT-TGCTACGGC 

II I III II llll III II II llll I I II I II III I II II 

TACT— TTGGTTGCAGC — ACAA-CTGTATATATTGTATAACCGAAATTGATTATTTTCATTGTCCTTATGC 

4160 4170 4180 4190 4200 4210 4220 

2200 2210 2220 2230 2240 2250 2260 

— TGACCGCTA CGCCGCCGCAATAAGGGTACTGGGCGGCCCGTCGAAGGCCCTTTGGTTTCAGAAAC 

m ii ii mm mi i i i ii mi i i m 

AGTGATTTATAATTAGAGCATGTTTAATAAGTTTACTATTCTTGTTAACTA— GTCATTTGACTGGAAAAAA 
4230 4240 4250 4260 4270 4280 4290 

2270 2280 2290 2300 2310 2320 2330 

CCAAGGCCCCCCTCATACCAACGTTTCGACTTTGATTCTTGCCGGTACGTGGTGGTGGGTGCCTTAGCTCTT 

II I I I II I I 

ATAAAATACTTTTAAATGGAAAAAAAAAAAAAAAAAAA 
4300 4310 4320 X 


2340 2350 

TCTCGATAGTTAGAC 


10. ELLIS-01 2-FIG2AB.SEG (1-2350) 

Q25975 MH fiutant porcine ryanodine receptor cDNA. 

ID Q25975 standard; DNA; 15377 EP. 

AC 625975; 

DT 08— JAN-1 993 (first entry) 

DE MH nutant porcine ryanodine receptor cDNA, 

KM MH; RYR1; calcium release channel; sarcoplasmic reticulum; 

KW transverse tubule; Pietrain! Yorkshire; polymorphism; beta strand! ss. 
OS Synthetic. 

FH Key Location/Gualifiers 

FT CDS 130.. 15237 

FT /Hag= a 

FT variation 207 

FT /*tag= b 

FT /label= Polymorphic_site 

FT variation 405 

FT 7Hag= c 

FT / l abel = Polymorphic_site 

FT variation 438 

FT /Hag= d 

FT 7label= Polymorphic_site 

FT variation 876 

FT /#tag= e 

FT / l abel = Polymorphic_site 

FT variation 1329 

FT /Hag= f 


FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

PN 

PD 

PF 

PR 

PR 

PR 

PA 

PA 

PI 

DR 

DR 

PT 

PT 


variation 1972 

/Hag= g 

/label= MH_nutation 
variation 2007 

/Hag= h 

/label= Polynorphic_site 
variation 4071 

/Hag= i 

/ 1 abel= Polynorphic_site 
variation 4383 

/*tag= j 

/labet= Polynorphic_site 
variation 4462 

/Hag= k 

/label= Polynorphic_5ite 
variation 4494 

/*tag= 1 

/label= Polynorphic_5ite 
variation 6867 

/Hag= n 

/ 1 abel= Polynorphic_5ite 
variation 7692 

/Hag= n 

/label= Polynorphic_site 
variation 8940 

/#tag= o 

/label= Polynorphic_site 
variation 9192 

/Hag= p 

/label= Polypiorphic_site 
variation 9585 

/Hag= q 

/label= Polypiorphic_site 
variation 9600 

/*tag= r 

/label= Polyfiorphic_site 
variation 9951 

/*tag= s 

/label= Polynorphic_site 
variation 10111 


/#tag= t 

/label= Polynorphic_5ite 
variation 11250 

/Hag= u 

/label= Polynorphic_site 
variation 12300 

/#tag= v 

/ l abe 1 = Polynorphic_site 
variation 14007 

/#tag= u 

/label= Polynorphic_site 
polyA_s i gn a 1 15355. .15360 

/Hag= x 
W0921 1387-A. 

09-JUL-1992. 

20- DEC-1991.' CA0457. 

21- DEC-1990; GB-027869. 
20-HAY-1991! GB-010865. 


09-SEP-1991 1 GB-0 19250. 

(UYGU-) UN IV GUELPH. 

(UTOR ) UNIV TORONTO INNOVATIONS FOUND. 
MacLennan DH > O'Brien PJi 
HP I ! 92-250106/30. 

P-PSDB; R25450. 


Purified DNA nol. for diagnosis of porcine malignant hyperthernia 


— roc 




•4- nr 


1 c 


PT receptor uith specified endonuclease restriction nap 
PS Disclosure; Fig 2; 96pp! English. 

CC The sequence given is the nutant pig ryanodine receptor (RYR1) gene 
CC fron swine cDNA. The polynorphic sites were observed in conparisons 

CC of Pietrain and Yorkshire breeds. There are 17 polynorphisns between 

CC the tuo breeds. The polynorphisn at position 1972 causes a nutation 
CC fron Arg to Cys and this is thought to be the nolecular basis of 
CC porcine nalignant hyperthernia (MH). This nutation lies within the 
CC region of RYR1 that is concerned uith the binding of regulators of Ca2+ 
CC release channel gating. Analysis of surrounding sequences suggests 
CC that this nutation lies within a beta strand donain conprising roughly 

CC of anino acids 520 to 830. RYR1 is the calciun release channel of the 

CC sarcoplasnic reticulun and is a large protein which spans the gap 

CC between the transverse tubule and the sarcoplasnic reticulun. The 

CC cannel is activated by ATP; calciun; caffine; and nicro-nolar 

CC ryanodine. It is inhibited by rutheniun red; tetracaine; calnodulim 

CC high Mg2+ and ryanodine. 

SQ Sequence 15377 BP; 3197 A; 4630 C; 4755 G; 2774 T; 

SQ 21 Others; 


Initial 

Score = 

134 

Optinized Score = 994 

Significance = 

6.94 

Residue 

Identity = 

m 

Hatches = 1215 

Hisnatches = 

988 

Gaps 

= 

320 

Conservative Substitutions 

= 

0 


X 10 20 

ATGTCCATG — AACTGCTGAGT 

II I II II II III 

TTCGAGTAGGGGATGACCTCATCCTCGTCAGTGTCTCCTCTGAGCGTTACCTGCACCTGTCGACAGC-CAGT 
620 630 640 650 660 670 680 

30 40 50 60 70 80 90 

GGATAAACAGCACG-GGATATCTCTGTCTAAAGGA-ATATTACTACACCAGGAAAAGGACACATTCGACAAC 

II I I II I II III II I I I I I I I I II I II II I I I 

GG-GGAGCTCCAGGTTGACGCCTCCTTC— ATGCAGACACT-GTGGAACATG—AACCCCATCTGCTCTGGC 
690 700 710 720 730 740 750 

100 110 120 130 140 150 

AGGAAAGGAGCC — TGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGCCATGG — GA-AAC — A 

i in it i iii ii i iii iiiii ii i i ii mu ii i 

TGTGAAGAAGGCTATGTGACTGGGGGTCAC — GTCCTCCGCCTCTTTCACGGACACATGGATGAGTGCCTG 
760 770 780 790 800 810 820 

160 170 180 190 200 210 220 

ACTGT-TACAACG-TGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGTGCAGAACTC 

ii i i i ii ii ii i ii i ii i i i ii i iii mi i in i n i 

ACCATCTCCCCCGCTGACAGTGA-TGACCAGCGCAGACTTGTCTACTACGAGGGKGGATCTGTG-TGCACCC 
830 840 850 860 870 880 890 


230 240 250 260 270 280 290 

CTG — TGATAACTGT-CAGCCTGGTACTTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGCCCT — CC 

i i i ii i ii mi i i n n in i in i i nil n i 

ACGCCCGCTCCCTCTGGAGACTGGAA CCGCTGAGAATCAGCTGGAGTGGGAGCCACCTGCGCTGGGGC 


900 

910 

920 

930 

940 

950 

960 

300 

310 

320 

330 

340 

350 



AAGTACCTTC — TCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTG-TGTGCAGG — CTATTT 

ii mi in mi n n in in i n i i i nil n i 

CAGCCGCTTCGCATCCGGCAT — GT-CACCACCGGGAGGTACCTGGCGCTCATCGAGGACCAGGGCCTGGTG 
970 980 990 1000 1010 1020 


360 370 380 390 400 410 420 

CAGGTTCAAG-AAGTTTTGCTCCTCTACCCA— CAACGCGGAGTGTGAGTGCATT— GAAGGATTCCATTG 

mi i i n n n i in i ii i i in iiiii iiiii i i 

GTGGTTGATGCCAGCAAGGC-CCAC-ACCAAGGCCACCTCCTTCTGTTTCCGCATTTCCAAGGAGAAGCTGG 
1030 1040 1050 1060 1070 1080 1090 



430 440 450 460 470 480 

CTTGGGGCCAC-AGTG-CACCAGATGTGAAAAGG ACTG CAGGCCTGGCCAGGAGCTAACGAAGC 

I II II I II I il I III II III II I III till I II II 

ATACGGCCCCCAAGCGGGACGTGGAGGGCATGGGCCCCCCTGAGATCAAGTATGG — GGAG-TCACTGTGC 
1100 1110 1120 1130 1140 1150 1160 

490 500 510 520 530 540 550 

AGGGTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTG-GACG 
I III I I III II Ml I I I III I I I I I I lllll I I 
TTCG-TGC--AGCATGTGGCCTCGGGCCTGTGGCTTACCTATGCTGCCCCAG-ACCCCAAGGCCCTGCGGCT 


1170 

1180 

1190 

1200 

1210 

1220 

1230 

560 

570 

580 

590 

600 

610 

620 


AACTGCTCTCTAGACGGAAGGTC — TGTGCTTAAGACCGGGACCACGGA-GAAGGACGTGGTGTGTGGACCC 

i in in mu i i m ii ii n m ii i ii ii i i mu 

CGGCGTGCTCAAGA-AGAAGGCCATTCTGCACCAGGAAGGCCACATGGACGATGCAC-TG-TCACT-GACCC 


1240 

1250 

1260 

1270 

1280 1290 

1300 

630 

640 

650 

660 

670 680 

690 


CCTGTGGTGAGCTTCTCTCCCA-GTACCAC-CAT-TTCTGT — GACTCCAGAGGGAGGACCAGGAGGGCACT 

mi ii mu i i i i m m i m i i i in mi 

GCTGTCAGCAGGAGGAGTCCCAGGCRGCCCGCATGATCTATAGCACTGCTG-GCCTCTACAA CCACT 

1310 1320 1330 1340 1350 1360 

700 710 720 730 740 750 760 

CCTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTGCTGCT — GGCCCTGATCTTCATTACTCT 

i i hi m i m m ii i i i m ii mu i ii i ii 

TCATCAAGGGCCTGGACAGCTTC— AGCG— GAAAGCCACGGGGCT-CTGGGGCCCCGGCTGGCACAGCGCT 
1370 1380 1390 1400 1410 1420 1430 

770 780 790 800 810 820 830 

CCTGTTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCAT-TTAAGAAGACCA 

I 11 1 III II 1 I I I II 1 I II III II I 1111 II 

ACCCCTCGAGGGCGTCATCCTGAGCCTGCAGGACCTCATCGGCTACTTTGAGCCGCCCTCGGAAGAGCTGCA 
1440 1450 1460 1470 1480 1490 1500 


840 850 860 870 880 890 900 

-CTGGAGCAGCTCAAGAGGAAGATGCTTGTAG-CTGC-CGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAG 

1 III II 1111 III III I II 1111 I I III II I! 1 1 1 1 II I III 
GOACGAGGAGAAGCAGAGCAAGCTGC — GCAGCCTGCGCAACCGCCAGAGCCTCTTCCAGGAGGAGG-GGAT 
1510 1520 1530 1540 1550 1560 1570 


910 920 930 940 950 960 970 

GCTATGAGCTGTGATGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAGA-AGCACTAGGACCCCACCATC 

m i in n n iii n in in i i mi i inn i 

GCTCT-CCCTG GTCCT-GAATTGCA — TTGACCGCCTAAATGTCTACACCACT-GCTGCCCACTTTG 

1580 1590 1600 1610 1620 1630 

980 990 1000 1010 1020 1030 

CTGTG-GAACAGCACA— AGCAACCCCACCACCCTGTTCTTACA CATCATCCTAGATGATGTGTGGGC 

HI I III I I II I II 1 1111 II I I I II 1111 I III 

CTGAGTTTGCAGGAGAGGAGGCAGCCGA— GTCCTGGAAAGAGATTGTGAACCTGCTGTATGAGATCCTGGC 
1640 1650 1660 1670 1680 1690 1700 

1040 1050 1060 1070 1080 1090 1100 

GCGCACCTCATCCAAGTC — TC-TTCTAACGCTAACATATTTGTCTTTACCTTTTTTAAATCTTTTTTTAAA 

i i n mi i i n i i in i i i in n ii n i i in 

-CTC-TCTGATCCGTGGCAATCGTGCCAACTGTGCCCT-TTTCTC— CAACAACTTGGATTGGCTGGTCA— 
1710 1720 1730 1740 1750 1760 1770 

1110 1120 1130 1140 1150 1160 1170 

TTTAAATTTTAT-GTGTGTGAGTGTTTTGCCTGCCTGTATGC-ACACGTG-TGT-GTGTGTGTGTGTGTGAC 

II I II 1 II III I III I I II I I III III lllll II III 

-GCAAGCTGGATCGACTG-GAGGCCT — CCT-CAGGGATCCTGGAGGTGCTGTACTGTGT-CCTGATTGAG 
1780 1790 1800 1810 1820 1830 



1180 1190 1200 1210 1220 1230 1240 

ACTCCTGATG-CCTGAGGAGGTCA — GAAGAGAA-AGGGTTGGTTCCATAAGAACTGGAGTTATGGA-TGGC 

i mm i mu i m i mu i urn i n n mi n 

AGTCCTGAGGTCCTGA— ACATCATCCAGGAGAACCACATCAAGTCCAT— CATCT-CCCTTCTGGACAAGC 
1840 1850 1860 1870 1880 1890 1900 

1250 1260 1270 1280 1290 1300 

-TGTGAGCCGGNNNGATAGGT-CGGGA CGGAG — ACCTGTCTTCTTATTTTAACGTGACTGTATAATA 

ii hi i mi i iii iii mu ii i m i n i i 

ATGGGAG— GAACCACAAGGTGCTGGATGTCCTGTGTTCCCTGTGTGTGTGCAATGGTGTGGCCGTGYGCTC 


1910 

1920 

1930 

1940 

1950 

1960 

1970 

1310 

1320 

1330 

1340 

1350 

1360 

1370 


AAAAAAAAATGATATTTC-GGGAA-TTGTAGAGATTGTCCTGACACCCTTCT — AGTTAA — TGATCTAAG 

II II II III I I III III I III II II II II II I III | 
CAACCAAGATCTCATTACTGAGAACTTGCTS-CCTGGCCGCGAGCTTCTGCTGCAGACAAACCTCATCAACT 
1980 1990 2000 2010 2020 2030 2040 


1380 1390 1400 1410 1420 1430 1440 

AGGAATTGTTGATACGTAGTATACTGTATATGTGTATGTATATGTATATGTATATATAAGACTCTTTTACTG 

II II II I II II I III I I I I I I I I II III I 

ATGTCACCAGCATCCGCCCCA-AC — ATCTTTGTGGGCCGA-GCAGAGG — GCACCACAC— AGTACAG 
2050 2060 2070 2080 2090 2100 

1450 1460 1470 1480 1490 1500 1510 

TCAAAGTCAACCTAGAGTGTC-TGGT-TACCAGGTCAATTTTATT— GGACATTTTACGTCACACACACACA 

Mil II I III III INI II I II II III Mil I I Mil III I 

-CAAATGGTACTTTGAG-GTCATGGTGGACGAAGT-GGTTCCATTCCTGACAGCTCAGGCCACCCACCTGCG 
2110 2120 2130 2140 2150 2160 2170 

1520 1530 1540 1550 1560 1570 

CACACAC — ACACACAC — ACGTTTATACTACGTA — CTGTTATCG--GTATTCTACGTCATATAATGGGA 

I I I III I I II I I II III III II I I II III 
GGTGGGCTGGGCCCTCACCGAAGGCTACAGCCCCTACCCTGGGGGCGGCGAGGGCTGGGGC-GGCAACGGGG 
2180 2190 2200 2210 2220 2230 2240 

1580 1590 1600 1610 1620 1630 1640 

TAGGGTAAAA — GGAAACCAAAG--AGTGA-GTGATATTATTGTGGAGGTGACAGACTACCCCTTCTGGGT 

i ii i i i ii ii iii ii i i mi mi i ii i mi 

TCGGCGATGACCTCTATTCCTACGGCTTTGACGGGCTGCATCTCTGGACAGGACA-CGTGCCACGCCTGGTG 
2250 2260 2270 2280 2290 2300 2310 

1650 1660 1670 1680 1690 1700 

ACGT AGGGACAGACCTCCTTCGGACTGTCTAAAACTCCCCTTAGAAGT-CTCGTCAAGTTCCCGGA-C 

ii i mi in i inn i i nil n n n i i n in i 

ACTTCCCCAGGG-CAG— CACCTTCTGGC CCCCGAGGACGTGGTCAGCTGCTGCCTGGACC 

2320 2330 2340 2350 2360 2370 

1710 1720 1730 1740 1750 1760 1770 

GAAGAGGACAGAGGAGACACAGTCCG-AAAAGTTATTTTTCCG-GCAAATCCTTTCCCTGTTTCGTGACACT 

II I I I III 1111 II I III III I I I I I I I I II 
TCAGCGTGCCGTCCA-TCTCCTTCCGCATCAACGGCTGCCCCGTGCAGGGCGTCTTCGAG-GCCTTCAACCT 
2380 2390 2400 2410 2420 2430 2440 

1780 1790 1800 1810 1820 1830 1840 

CCAC — CCCTTGTGGACACTTGAGTGTCATCCT-TGCGCCGGAAGGTC-AGGT— GGTACCCGTCTGTAGG 

in i in i ii mi i i i mu in mi in n n i 11 

CAACGGGCTCTTCTTCCCCGT CGTCAGCTTCTCGGCCGG — TGTCAAGGTGCGGTTCC — TCCTTGGG 

2450 2460 2470 2480 2490 2500 2510 

1850 1860 1870 1880 1890 

GGCGG GGAGA — CAGAG CCGCGGGGGAGCTACGAGAATCGACTCACAGG--GCGCCCCGGG 

in i n n ii n n i i mm I I II III I II I 

GGCCGCCACGGCGAATTCA-AGTTCCTCCCTCCGCCTGGCTACGCCCCTTGCCAC-GAGGCTGTGCTC 

2520 2530 2540 2550 2560 2570 



1900 1910 1920 1930 1940 1950 1960 

CTTCGCAAATGAAACTTTTTTAATCTCA-CAAGTTTCGTCCGGGCTCGGCGGACCTATGGCGTCGATCCTTA 

i ii i ii i i i i i ii mi n m i it n ii i m 

CCACGAGAGCGACTCCGTCTGGAACCCATCAAG— GAGTATCGGCGAGAAGGGCCCCGGGGACCCCACCTGG 
2580 2590 2600 2610 2620 2630 2640 

1970 1980 1990 2000 2010 2020 2030 

T-TACCTTATCCTGGCGCC-AAGATAAAACAACCAAAAGCCTTGACTCCGGT ACT AATTCTCC 

I II 1 II I III II I I II III II Hill III III I I 

TGGGCCCCAGCC-GCTGCCTCTCACACACCGACTTTGTGCCCTG--CCCGGTGGACACTGTCCAGATTGT-C 

2650 2660 2670 2680 2690 2700 2710 

2040 2050 2060 2070 2080 2090 

CTGCCGGCCC-CCGTAAGCATAACGCGGCG-ATCT-CCACTTTAAGAACCTGGCCGCGTTCTGCCTGGTCTC 

Hill II! I HI II I II II II II Hill I I ilil I I I 

CTGCCTCCCCATCTGGAGCGTATCCGGGAGAAGCTGGCA GAGAACATCCATGAACTCTGGGCGCTGAC 

2720 2730 • 2740 2750 2760 2770 2780 

2100 2110 2120 2130 2140 2150 2160 

GC-TTTCGTAAACGGTTCTTACAAAAG — TAATTAGTTCTTGCTTTCAGC — CTCCA — AGCTTCTGCTAGT 

II III I II I III III I I Mil I! II I III I I 

GCGCATCGAGCAGGGCTGGACCTATGGCCCGGTTCGGGATGACAATAAGCGGCTGCACCCGTGTCTCGTGGA 

2790 2800 2810 2820 2830 2840 2850 

2170 2180 . 2190 2200 2210 2220 

CTATGGCAGCATC — AAGGCTGGTATTTGCTACGGCTGACCGCTACGCCGCCGCAATAAGGGT ACTG 

II III! II II I I I III I I I I I II I I I I I III 
CTTCCACAGCCTCCCGGAGCCCGAGAGGAATTACAACCTGCAG— ATGTCGGGGGA— GACGCTCAAGACTC 
2860 2870 2880 2890 2900 2910 2920 

2230 2240 2250 2260 2270 2280 2290 2300 

GGCGGCCCGTCGAAGGCCCTTTGGTTTCAGAAACCCAAGGCCCCCCTCATACCAACGTTTCGACTTTGATTC 

II I I I I HI III II III II 1111 I II II I 

TGCTGGCGCTGGGCTGCCACGTGG — GCA-TGGCGGACGAGAAGGCAGAGGACAACCTGAGGAAGACGAAAC 
2930 2940 2950 2960 2970 2980 2990 

2310 2320 2330 2340 2350 

TTGCCGGTACGT-GGTGGTGGGTGCCTTAGCT-CTTTC — TCGATAGTTAGAC 

I II nil II I I II I I I I I II I I III 

TCCCCAAGACGTACATGAT — GAGCAATGGGTACAAGCCAGCGCCA-CTGGACCTGAGCCATGTGAGACTGA 
3000 3010 3020 3030 3040 X 3050 3060 

CGCCTGCGCAGACCACGCTGGTGGACCGGCT 
3070 3080 3090 


11. ELLIS-Q12-FIG2AB.SES (1-2350) 
Q 1 4755 FUS2 gene. 


ID 

Q 14755 standard) DNA; 2492 BP. 

AC 

Q 14755; 


DT 

03-FEB- 1992 

(first entry) 

DE 

FUS2 gene. 


KW 

Pheromone inducible yeast promoter; bilateral karyogamy defect; 

KW 

FUSl; B1K1; ds. 

OS 

Saccharomyces 

cerevisiae. 

FH 

Key 

Location/Qualifiers 

FT 

CDS 

403.. 2256 

FT 

/#tag= a 


FT 

Hisc_feature 

1 . .402 

FT 

/#tag= b 


FT 

/label= claim 

6 

FT 

Misc_feature 

1 . .2253 

FT 

/#tag= c 


FT 

/label= claim 

8 

PM 

nc=:nA7tsa-& 




PD 05-N0V-1991. 

PF 24-JUN-19S8; 212270. 

PR 24-JUN-19S7; US-066078. 

PR 24-JUN-1988; US-212270. 

PA (WHIT-) WHITEHEAD INST BIOM. 

PI Fink GR» Trueheart J> Elion EA; 

DR WPI; 91-346534/47. 

DR P-PSDB; R14910. 

PT DNA fragment contg. pheromone-inducible yeast promoter - useful 

PT for transforming yeast cells to produce foreign proteins> which 
PT may be toxic to yeast cells. 

PS Disclosure; Fig 5; 23pp> English. 

CC Transcription of the FUS2 gene is greatly enhanced by the presence 

CC of the appropriate mating pheromone. The promoter region can 

CC therefore be used for the pheromone inducible expression of proteins 

CC of interest. 

CC See also Q14754. 

SQ Sequence 2492 BP; 911 A; 408 C; 441 G; 732 T; 

Initial Score = 129 Optimized Score = 939 Significance = 6.63 

Residue Identity = 46X Matches = 1136 Mismatches = 1038 

Gaps = 282 Conservative Substitutions = 0 


X 10 

ATGT — CCATGAACTGCTGAG 


CTATTGTGCCCGCCGCGTCACAAATGCGCCCCGAACTTGTCGCGAAGTTAATCTGAAACAT-ATATGTTACC 
130 140 150 160 170 180 190 


20 30 40 50 60 70 80 

TGGATAAACAGCACGGGATATCTCTGTCTAAAG-GAA TATTACTACA-CCAGGAAAAGGACACATTCG 


I lillill III I I II I III I I II I I II I III I II 
TACTGAAACAGCGCATGTTGGAAAAGACAAAGGTGAAGACGAAGTTGTATATTTAAGATA— GACCCTTTAT 
200 210 220 230 240 250 260 


90 100 110 120 130 140 150 

ACAAC — AGGAAAGGAGCCTGTCACAGAAAACC-ACAGTGTCCTGTGCATGTGACATTTCGCCATGGGAAA 

III I Mil I I i mi II II I II Mil I II till 
ACATCCTTTTGAAAAAATTATTAATGTGGCAACCGTCTTTTATTTGACAAAGTATCTTTTTTCTTTGTGAAA 
270 280 290 300 310 320 330 340 


160 170 180 190 200 210 220 

CAACTGTTACAACGTGGTGGTCATTGT— GCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGTGCAGAAC 

I I I III I I II II II I I I I I II II II II II 
CCAATTTTA-GGTTTTCTTGTTATAGTAAGTTCTTAAGAAAAAGACAAGA-AAACCCCTTGCGATGTTTAAG 
350 360 370 380 390 400 410 

230 240 250 260 270 280 290 

TCCTGTGATAAC-TGTCAGCCTGGTACTTTCTGCAGAA-AATACAATCCAGTCTGCAAGAG-CTGCCCTCCA 

I I lllll III I II III II II II II II I II Hill II I I 

ACTTCATATAACTTGTACGATTTGAACTATCCGAAAAATGATTCATTAACGCCAATAAGAGACT — ACAAAA 


420 

430 

440 

450 

460 

470 

480 

300 

310 

320 

330 

340 

350 

360 


AGTAC-CTTCTCCAGCATAG'GTGGACAGCCGAACTGTAACATCTGCAGA — GTGTGTGCAGGCTATTTCAGG 

I II II II III Nil I I I II III I I I I I III I 
ATGACTATTTTCATAAAAATGATGACAAATTACCAGAAATTGTTAGAAAACCTACGAGAAAGTTAT — CGA 


490 

500 

510 

520 

530 

540 

550 

370 

380 

390 

400 

410 

420 

430 


TTCAAGAAGTTTTGCTCCTCTACCCACAACGC-GGAGTGTGAGTGCATTGAAGGATTCCATTGCTTGGGGCC 

ii m in i i i ii i ii i ill m i mi i i 

AACATGAAAACAAACTCAACGATAAAAAATTCACGAATAAACGACCA-GCAAGTCTGGACTTGCAT — TCT 
560 570 580 590 600 610 


ACAGTGCA — CCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAACCTG 

I mi I II II III II I II II till II II Hill || 

ATAGTGGAGAGCCTGAGCAATAAAAAAATTTA — CTCTCC — TATTAACACAGAGATATTTCAAAA--TG 

620 630 640 650 660 670 680 

510 520 530 540 550 560 570 

TAGCTTGGGA — ACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTGGACG-AACTGCTCTCTAGA 

i inn mi i i n i i i i m ii m i n i ii i 

T— CGTGAGACTGAATTTGA-GCCCTCA— GATTCCCAATTCTCCTCACGAGGGATGCAAATTTTATAAAAT 
690 700 710 720 730 740 

580 590 600 610 620 630 

CGGA-AG--GTCTGTGCT-TAAGACCGGGACCACGGAGAAGGACGTGGT--GTGTGGACCCCCTGTGGTGAG 

II I II II I 11 I II III 1 1 II I II | | | III II 

CGTACAGGAGTTTTACCTCTCTGAAGTGGAATA-TTACAATAATTTGTTAACCGCAAATAACGTATACAGAA 

750 760 770 780 790 800 810 820 

640 650 660 670 680 690 

CTTC-TCTCCCAGT — ACCACCATTTCTG — TGA-CTCCAGAGGGAGGA--CCAG-GA-GGGC-ACTCCTTG 

I I III III III I I I II II II I II III II I I III 

AGGCATTGAATAGTGATCCAAGATTCAAGAATAAACTTGTCAAGCTTGATTCAAGTGACGAGCTATTGCTTT 
830 840 850 860 870 880 890 


700 710 720 730 740 750 760 

CAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTGCTGCTGG-CCCTGATCTTCATTACTCTCCTGTT 
Ii II III II II III II I INI II I I II II I III 
TTGG — GAACATTGACACTATTGCGTCAATCAGC-AAAATACTGGTAACGGCAATAAAAGAC-CTACGGTT 



900 

910 

920 

930 

940 

950 

770 

780 

790 

800 

810 

820 

830 


CTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCACTGGAGC 

i ii it i mm ii ii n i i i i n mu i 

AGCCAAGC — AACG — TGGGAAAATGTTGGATGCCA-ATGAATGGCAAAAGATA— TTGACCAAAAATGA 
960 970 980 990 1000 1010 1020 

850 860 870 880 890 900 910 

AGCTCAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAGGCTATGAGCT 

i in i i ii i i i ii hi i m ii i mi i i ii ii i 

GGTACAACA-GCAGCTATATTCAACTTTTGATAT-TTCAG— AGGCGTTCGAGCAACA-TTTGTTAAGA-AT 


1030 

1040 

1050 

1060 

1070 

1080 

920 

930 

940 

950 

960 

970 


GTGATGTA-CTATCCTAGGAGATGTGTGGG — CCGAAACCGAGAAGCACTAGGACCCCACCATCCTGTGGA 

II I III I II I III 1 II III I 1 I Mil I II 1 I II 
CAAATCCACCTACACAAGCTATTTTGTTAGCCACCAAAAACAAATGGAACTA-TTTACTACATTAAGGATGA 
1090 1100 1110 1120 1130 1140 1150 

990 1000 1010 1020 1030 1040 

ACAGCACAAGCAACCCCACCAC-CCTGTTCT— TACACATCATCCTAGA-TG-ATGTGT — GGGCGCGCA 

I I I II II II II I I II II III II 1111 I I III 
ATA — AGAATCATTTTTTTAACAAGTGGTATGAATATTGTTTAAAAGAGAGTGGATGTATAAAGTTAGAGGA 
1160 1170 1180 1190 1200 1210 1220 


1050 1060 1070 1080 1090 1100 1110 

CCTCAT-CCAAGTCTCTTCTAACG-CTAACATATTTGTCTTTACCTTTTTTAAATCTTTTTTTAAAT-TTA- 

III III I I II I II II I II II I III III 1111 I I III 
CATATTGAAAAGCCCGATGAAAAGACTGAC-TCAGTGGATTGATACTTTGGAAA-CTTTGGAAAGCTGTTAC 
1230 1240 1250 1260 1270 1280 1290 

1120 1130 1140 1150 1160 1170 

-AATTTTATGTGT-GTGAGTGTTTTGCCTGCCTGTATGC — ACACG-TGTGTGTGTGTGTGTGTGTGACAC 

II I I I I I II II II II II II III I I I I I I I I I II 
GAAGATATTCTTTCGCCAGAATTGGGCTTGAAACTAAGCCCGACAAGAAGAAAATAT-TCTTTATTTTCCAA 
1300 1310 1320 1330 1340 1350 1360 1370 


T — CCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTTCCAT-AAGAACTGGAGTTATGGATGGCTGT 

i ii it mi it i n 111 ii min i ii i mi i i i 

TAAGTTAGAAACC-GAGG-TCTCCG-AGTATAAGAGT-AATTCCATGTATAATTTCAGTT — TAACCCCAT 

1380 1390 1400 1410 1420 1430 

1250 1260 1270 1280 1290 1300 1310 

GAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTA— TTTTAACGTGACTGTATAATAAAAAAAAAA 

ii i ii n ini i ii i i i mm i i n mi mi 

CAGAGATTATACAAAGTTATGATGAAGATCAGTTTACACACCTTTTAAAACCCCCAGACAAACAAAATAAAA 

1440 1450 1460 1470 1480 1490 1500 

1320 1330 1340 1350 1360 1370 1380 

TGATATTTCGGGAAT-TGTA — GAGA-TTGTCCTGACACCCTTCTAGTTAATGATCT— AAGAGGAATTGT 

mi i i ii i i mi i i ii i i mi i m i n i 

ATATATGTAATGCATCTCGACAAGAGAGTAATTTGGATAATAGTAGAGTTCCTTCTCTTCTTTCTGGATCAT 

1510 1520 1530 1540 1550 1560 1570 

1390 1400 1410 1420 1430 1440 

TGA--TACGTAGTATACTGTATATGTGTATGTATATGTATATGTATATATAAGACT-CTTTTACTGTCAAAG 

ii in n i mu ii i i in i in i i in i i in i 

CGAGTTAC-TACTCAGATGTATCAGGGCTAGAAATTGT-CACTAATACTTCA-ACTGCCTCAGCTGAGATGA 

1580 1590 1600 1610 1620 1630 1640 

1450 1460 1470 1480 1490 1500 1510 

TCAACCT-AGAGTGTCTG-GTTACCAGGTCAATTTTATTGGACATT-TTACGTCACA-CA-CACACACACAC 

i n ii i i n ii i m mm i inn i inn n i i n 

TAAATCTAAAAATGGATGAAGAAACAG — AATTTTTT — ACATTGGCAGATCACATCAGTAAATTCAAGA 

1650 1660 1670 1680 1690 1700 1710 

1520 1530 1540 1550 1560 1570 1580 

ACACACACACACGTTTATACTACGTACTGTTATCGGTATTCTACGTCATATAATGGGATAGGGTAAAAGGAA 

tin mi i n i i i i i m i i i n i in i n 

AAGTAATGAAAGGTTTGT-- TA-GAA-TTAAAAAAGAATTTATTGAAAAACGATCTGTCAGGCATTATTGAT 

1720 1730 1740 1750 1760 1770 1780 

1590 1600 1610 1620 1630 1640 

ACCAAAGAGTGAGTGATATTATTG — TGG AGGTGACAGACTAC CCCTTCT-GGGTA-CGTAGG 

i i n i ii iii ii in mm n i i mini mi i i 

ATC — AGTTTAAGAAGAATAAATGCATGGAAAAAGGTGATCGAGTGCGAACGCCCTTCTCGTGCATTTTTTG 

1790 1800 1810 1820 1830 1840 1850 

1650 1660 1670 1680 1690 1700 1710 

GACAGACCTCCTTCGGACTGTCTA-AAACTCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGAAGAGGACAGA 

II III I 1 II I I I III II 1 II 1 1 1 II 11 I II 1 
CGCACGATAACTT — AATATCGACCATGTGTTCTTCGTACATAGATAAACTGCATGAACAAAAAAATCA-A 
1860 1870 1880 1890 1900 1910 1920 

1720 1730 1740 1750 1760 1770 1780 

GGAGACACAGTCCGAAAAGTTATTTTTC-CG— GCAAATCCTTT — CCCTGTTTCGTGACACTCCACCCCT 

I I III I I I III I I III II II I III II II I III I 
GTA-ACA-ATTTTG-AAACTCACAGAGCTCGAAACAGATGTGATGAACCCACTTGAAAGAATCATAGCCCAT 
1930 1940 1950 1960 1970 1980 

1790 1800 1810 1820 1830 1840 1850 

TGT-GGACAC7TGAGTGTCATCCT — TGCGCCGGAAGGTCAGGTGGTACCCGTCTGTAGGGGCGGGGAGAC 

III II II I I II II I I III I I II II I I II I I I I 

TGTACTACCGTTAAAAG-CAAACTAAAAGATTTGCAAGCTTACATGTTA TTTTTA CAAGAAAAA 

1990 2000 2010 2020 2030 2040 2050 

1860 1870 1880 1890 1900 1910 1920 

AGAGC-CGCGGGGGAGCTACGAGAATCGACTCACAGGGCGCCCCGGGCTTCGCAAATGAAACTTTTTTAATC 
I III II III II I I 1111 III I III II 1111 I II 

AAAGCAAATGTGCGAGATATTAAACGTGACTTGTTGGG — AATGCATTTC-CA AAAC— CTGCAA-- 

2060 2070 2080 2090 2100 2110 

1930 1940 1 950 19A0 1070 19fl0 1990 


TCACAAGTTTCGTCCGGGCTCGGCGGAC-CTATGGCGTCGATCCTTATTACCTTATC-CTGGCGCCAAGATA 

ii it i ii i 111 i in i i mu i i mi i in 

-AACCAGATGAAAAGGGAATTACCGGTCTTTATTACTTTGATCC-CACGATACTATCGAATGTATCTTGTTG 
2120 2130 2140 2150 2160 2170 2180 

2000 2010 2020 2030 2040 2050 2060 

AAACAACCAAAAGCCTTGACTCCGGTACTAATT CTCCCTGCCGGCCCCCGTAAGCATAACGCGGCGAT 

ii i mi m ii i ii ii ii m ii mi i ii i n i 

AACTATATCAAAGTCTT--CT — TAAAATATTTGGAAATCATTGCTGG — TGGAAAAAAATAC-CTGCAAA 
2190 2200 2210 2220 2230 2240 

2070 2080 2090 2100 2110 2120 2130 

CTCCACTTTAAGAACCTGGCCGCGTTCTGCCTGGTCTCGCTTTCG--TAAACGGTTCTTACAAAAGTAATTA 

m ii ii ii it i ii ii ii m ii m ii mi i i i 

AAGATCTTGAA-AATAT-GTCTCTTAATGACT-CTATAGCTACCGGCCAAA — TT AAAAATCTTGA 

2250 2260 2270 2280 2290 2300 

2140 2150 2160 2170 2180 2190 

GTTCTTGC— TTTCAGCCT — CCAAGCT — TCTG-CTAGTC-TATGGCAGCATCAAGGCTGGTATTTGCT 

i mi i i i n n i i i n i i i nil i i in in m i 

TATTTTGCAGTGTTATTCTAAATCACGATATATATGACAAAACGCATGGTAAGA-AAAGATTGGCCTTTCC- 
2310 2320 2330 2340 2350 2360 2370 

2200 2210 2220 2230 2240 2250 

ACGGCTGACCGCTA — CGCCGCC — GCAATAAGGGTACTGGGCGGCCCGT — CGAAGGCCCTTTGGTTTCAG 

i i mi m i m i i n in n i i i n i ii n n 

-CTGGAGACC-CTAGTGGAAGCCGTGTTGTCAGAAAACTTTTCGAACTTTAACAAAAG-AGTATATTT--AG 
2380 2390 2400 2410 2420 2430 2440 

2260 2270 2280 2290 2300 2310 2320 2330 

AAACCCAAGGCCCCCCTCATACCAACGTTTCGACTT-TGATTCTTGCCGGTACGTGGTGGTG'GGTGCCTTAG 

III I II II llli 1 II 1 II III II I 

— CTTATAG TTTTTA-GAATGTTTTGTTTTGTTTTTTACTAAAGTA-GTACT 

2450 2460 2470 2480 2490 X 

2340 2350 

CTCTTTCTCGATAGTTAGAC 


12. ELLIS-012-FIG2AB.SEQ (1-2350) 

N70128 Novel DNA encoding s polypeptide having nouse gran 

ID N70128 standard; DNA; 1363 BP. 

AC N7Q128; 

DT 22-0CT-1990 (first entry) 

DE Novel DNA encoding a polypeptide having nouse granulocyte 
DE colony-stinulating factor (b-CSF) activity is neu 
KH Mouse granulocyte colony stimulating factor; lymphokine; interleukin. 
OS Mouse. 

FH Key Location/Qualifiers 

FT CDS 68.. 157 

FT /*tag= a 

FT /product=Leader peptide 

FT nat_peptide 158.. 694 

FT /#tag= b 

PN J62269693-A. 

PD 24-NQV-1987, 

PF 1 9-HAY- 1 986 ; 112506. 

PR 1 9-MAY- 1 986 ; JP-1 12506. 

PA (CHUS) Chugai Pharmaceutical Kk. 

DR WPI; 88-004545/01. 

DR P-PSDB; P701 14. 

PT Neu deoxyribonucleic acid - 

PT is prepd. by forming mRNA from mammal cells producing 
PT polypeptide(s) uith mouse granulocyte colony stimulating factor 



PS Disclosure; Fig 1(A) Page 491; 12pp; Japanese. 

CC The CDS for the nature peptide (see FT) is clained (claims 5 and 6). It 

CC was prepd. as follows. nRNA is prepd. fron mammal cells capable of 

CC producing polypeptides having G-CSF activity and double stranded cDNA is 
CC produced fron the nRNA by conventional nethods. Polypeptides having nouse 
CC G-CSF activity are obtd. as 14-75S fractions by the sucrose 
CC density-gradient centrifugation method. 

SQ Sequence 1363 BP; 279 A; 403 C; 368 Gf 313 T; 

Initial Score = 127 Optimized Score = 587 Significance = 6.51 

Residue Identity = 48X Hatches = 709 Mismatches = 570 

Gaps = 180 Conservative Substitutions = 0 

310 320 330 340 350 X 360 370 

TCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTT 

III I I I II I 

GTATAAAGGCCCCCTGGAGCTG 
X 10 20 

380 390 400 410 420 430 440 

TGCTCCTCTACCCACAACGCGGAGTGTGAGTGCA-TTGA — AGGATTCCATTGCTTGGGGCCAC— AGTGCA 

ii ill ii i M iii it ii i i ii iiiii m i n m 

GGC-CCT — GGCAGAGCCCAGAGCTGCAGCCCAGATCACCCAGAATCCATGGCT CAACTTTCTGC- 


30 

40 

50 

60 

70 

80 

450 

460 

470 

480 

490 

500 510 


CCAGATGTGAA — AAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAACCTGTAGCTTGGG 

iiiii in iii ii mi i hi in i m i mi i n i i 

CCAGAGGCGCATGAAG — CTAATGGCCCTG-CAGCTGCTGCTGTGGCAAAG-TGCACTATGGTCAGGACGAG 
90 100 110 120 130 140 150 

520 530 540 550 560 570 580 

AACATTTAATGACCAGAACGGTACTGGCG-TCTGTCGACCCTGGACGAACTGCTCTCTAGACGGAAGGTCTG 

1 1 I 11 I I I III Mil I III I INI INI I Mil I 

AGGCCGT— TCCCCTGGTCACTGTCAGCGCTCTG-CCACCAT CCCTGC-CTCTGCCCCGAAGCTTCC 

160 170 180 190 200 210 

590 600 610 620 630 640 

TGCTTAAG-ACCGGGACCACG-GA-GAAGGACGTGGTGTGTGGACCCCCTG-TGGT-GAGC— TTCTCTCCC 

1 1 1 1 1 1 1 1 ii m ii i ii mi i n i n i i n n i nil n i i n 

TGCTTAAGTCCCTGGAGC-AAGTGAGGAAGATCCAGGCCAGCGG-CTCGGTGCTGCTGGAGCAGTTGTGTGCC 


220 

230 

240 

250 

260 

270 

650 

660 

670 

680 

690 

700 


AGTACCACCATTTCTGTGACTCC-AGAGGGAGGACCAGGAGGG-CACTCCTTGCAGGTC CTTACC 

I I II I I III II II II I II I III IIIII II I II III II 
A— CCTACAAGCTGTGTCACCCCGAGGAGCTGGTGTTGCTGGGCCACTCTCTGGGGATCCCGAAGGCTTCCC 
290 300 310 320 330 340 350 

710 720 730 740 750 760 770 

T-TGTTCCTGGCGCT-GACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCA 

i n n n n i n in in n i ii i i i ii i n i n mi 

TGAGTGGCT-GCTCTAGCCA— GGCCCTGCAGCAGACACAG — TGCCTAAGCCAGCTCCACAGTGGGCTC- 
360 370 380 390 400 410 420 

780 790 800 810 820 830 840 850 

AATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCACTGGAGCAGCTCAAGAGG 

II II I II II II 11 II I 1 1111 I I 1111 I II I 

--TGCCTC— TACCAAGGTCTCCTGCAGGCTCTATCGGGTATTTCCCCTG— CCCTGG — CCCCCACCTTG 
430 440 450 460 470 480 

860 870 880 890 900 910 

AAGATGCTTGTAGCTGCCGATGT — CC-ACA — GGAAGAAGAAGGAGG-AGGAGGAGGCTATGAGCT-GTGA 

I IIIII 1 1 II I IIIII INI till II II II II I III I I 

GACTTGCTT-CAGCTG — GATGTTGCCAACTTTGCCACCACCATCTGGCAGCAGATGG— AAAACCTAGGGG 

nan cnn con con c/in ccn 


920 930 940 950 960 970 980 

TGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAGAAGCACTAGGACCCCACCATCCTGTGGAACAGC-AC 

I! I III! till III I II III III Mill III Mil I 

TGGCC — CCTA CTGTG-CAGCCCACACAGAG-CGCCATGCCAGCCTTCACTTCTGCCTTCCAGCGCC 

560 570 580 590 600 610 

990 1000 1010 1020 1030 1040 1050 

AAGCA — A CCCCACCACCCTGTTCTTACACATCATCCTAGATGATGTGTGGGCGCGCACCTCATCCA 

in i ii in n i i n i mi n n i in n i n in 

GGGCAGGAGGTGTCCTGGCCATTTCGTACCTGCAGGGCTTCCTGGA-GACGGCTCGCCTTGCTCTGCA-CCA 
620 630 640 650 660 670 680 

1060 1070 1080 1090 1100 1110 1120 

AGTCTCTTCTA-ACGCTAACATATTTGTCTTT — ACCTTTTTTAAATCTTTTTTTAAATTTAAATTTTATGT 

i i in n i n i mi i i ini in i i mm linn 

CTTGGC — CTAGACCTGAGCAGAAAGCCCTTTCCAGATAGTTTA TTTATCTCTATTTAATATTTATGC 

690 700 710 720 730 740 750 

1130 1140 1150 1160 1170 1180 1190 

GTGTGAGTGTTTTGCCTGC-CTGTATGCACACGTGTGTGTGTGTGTGTGTGTGACACTCCTGAT — GCCTGA 

i i n mi i i n in ii n i i i n in n 

ATAT TTAAGCCTACTATTTAAAGACAAAGACGAGAAAATGGAGCTCTAAGCTTCTAGATCATTCTCT 

760 770 780 790 800 810 

1200 1210 1220 1230 1240 1250 

GGAGGTCAGAAGAGAAAGGGTTGGTTC-CATAAGAACTGGAGTTATGGATGGCT-GTGAG-CCGGNNNGATA 

I II II 1 II III! II I I II I II 1111 II III I 

CCACTTCCGA GTTTTGTTCTCCTGCTTAGAGCAGAGAGAGAAGGCTCTTGTGTCCTCCTGTGGA 

820 830 840 850 860 870 880 

1260 1270 1280 1290 1300 1310 1320 

GGTC-GGGACGGAGACCTGT— CTTCTTA-TTTTAA — CGTGACTG-TATAATAAAAAAAAAATGATATTTC 

II I Ml Mil II I I I I II I I II III I I I I I I I I 

GGCCAGGGAAGGAGATGGGTAAATACCAAGTATTGATTCCTG-CTGCTGCTCCAGGCACCCAGTTCTGTGGC 
890 900 910 920 930 940 950 

1330 1340 1350 1360 1370 1380 

GGGAATTGTAGAGATT— GT CCTGACACCCTTCTAGTTAATGATCTAA— GAGGAATTGTTGATACGT 

ii i i i i ii mi n i n i nn i i n i ii 

AGT ACCCCCAAAAAAT CAGTGAGCCCTG — CCGTGCTGAGGCACCATCTCAGGGGGGCCCAGGCAGCATCT 
960 970 980 990 1000 1010 1020 

1390 1400 1410 1420 1430 1440 1450 

AGTATACTGTATATGTGTATGTATATGTATATGT-ATATATAAGACTCTTTTACTGTCAAA — GTCAACCTA 

n i i i ii i n i i i nn in n i i mi i i n 

GGTCTCCCTTCCGGGGGACAAGACATCCCTGTTTAATATTTAA-ACAGCAGTGTTCCCAAACTGGGTTCTTA 
1030 1040 1050 1060 1070 1080 1090 


1460 1470 1480 1490 1500 1510 1520 

GAGTGTCTGGTTACCAGGTCAATTTTATTGGACATTTTAC-GT-CACACACACACACACACAC-ACACACAC 

II II I I I Mill III 1 III I II I Ml II I 1 I I I I II 
TA-TCCCTTGCT— CTGGTCAACCAGGTTGCAGGGTTTCCTGTCCTCACAGGAACGAAGTCCCTAAAGAAAC 
1100 1110 1120 1130 1140 1150 1160 

1530 1540 1550 1560 1570 1580 1590 

ACGTTTATACTACGTACTGTTATCGGTATTCTACGTCATATAATGGG'ATAGGGTAAAAGGAAACC — AAAGA 

II I I III I III III II II I I Mil I II III 
AC-TGGCAGCCAGGT-TTAGCCCCGGAATT-GACTGGAT-TCCTTTTTTAGGG-CCCTGCTGGCCTGGAAGT 
1170 1180 1190 1200 1210 1220 

1600 1610 1620 1630 1640 1650 1660 

GTGAGTGATATTATTGTGGAGGTGACA-GACTACCCCTTCTGGGTACGTAGGG'ACAGA — CCTCCTTCGGA 

Mil I Ml I II II I III III II II I II Mil I 

TGGAGTG GGGGGCAGAGGAGGCAGGAGGAAGCCTGGGGGGGGGGTTGGCATGGAGGGAGGCCTTCCCA 

i?4 ft ipsn i?/,n t37n i?an r?on 


1670 1680 1690 1700 1710 1720 X 

— CTGTCTAAAACTCC-CCTTAGAAGTC TCGTCAAGTTCCCGGACGAAGAGGACAGAGGAGACACAGT 

i ii i mi n i in ii mi i mi n i i in i i 

TCCACCCTCACCCTCCACCCCACCTGTCACTATAGCCAAGCTTGCGGA-TAATA-AAGTGTGGTGTTCC 
1300 1310 1320 1330 1340 1350 1360 X 

1730 1740 1750 1760 1770 

CCGAAAAGTTATTTTTCCGGCAAATCCTTTCCCTGTTTCGTGACACT 


13. ELLIS-012-FIG2AB.SEQ (1-2350) 

N81162 Encodes Western subtype of early summer meningoenc 

ID N81162 standard! DNA; 2418 BP. 

AC N81162; 

DT 26-0CT-1990 (first entry) 

DE Encodes Western subtype of early summer meningoencephalitis (ESME). 

KW early summer meningoencephalitis virus; live vaccines! ds. 

OS Early summer meningoencephalitis virus. 

FH Key Location/Qualifiers 

FT CDS 113.. 460 

FT /*tag= a 

FT /product=protein C 

FT CDS 461.. 727 

FT /Hag= b 

FT /product=protein prM 

FT CDS 728.. 952 

FT /Hag= c 

FT /product=protein C 

FT CDS 953.. 2418 

FT /Hag= d 

FT /product=protein E 

PN EP-284791-A. 

PD 05-DEC-1988. 

PF 29-FEB-1988! 103003. 

PR 20-MAR- 1987; EP-104114. 

PA (IMHU-) Immuno Chem Med AG. 

PI Heinz FX> Kunz C> Mandl C> Dorner F. Bodemer W; 

DR WPI! 88-294138/42. 

DR P-PSDB; P80573. P82324, P82325 & P82326. 

PT Neu DNA and RNA mols encoding proteins of meningoencephalitis virus - 
PT useful in vaccines , diagnostic agents and detection probes 
PS Disclosure! p! German. 

CC Encodes all the structural proteins of ESME virus. The invention 
CC covers fragments of this sequence and analogous RNA molecules. 

CC Corresponding mRNA sequence given in specification. 

SQ Sequence 2413 BP! 635 A! 507 C! 743 G! 533 T! 

Initial Score = 126 Optimized Score = 790 Significance = 6.45 

Residue Identity = 487. Matches = 982 Mismatches = 787 

Gaps = 269 Conservative Substitutions = 0 

X 10 

ATG-TCCA — TGAACTGCTGAG 

III II II III I I 

GGTGAGGAAAGAAAGGGATGGCTCAACTGTGATCAGAGCTGAAGGAAAGGATGCAGCAACTCAGGTGC-GTG 
470 480 490 500 510 X 520 530 

20 30 40 50 60 70 80 

TGGATAAACAGCAC — GGGATATCTCTGTCTAAAGGAATATTACTACACCAGGAAAAG-GACACATTCGACA 

mi ii mi i i in in mi i ii n in in n i 

TGGA-GAATGGCACCTGTGTGATC-CTGGCTACTG— ACATGGGGTCATGGTGTGATGATTCACTGTC-CTA 
540 550 560 570 580 590 600 


ACAGGAAAGGAGCCT — GTCA — CAGAAAACCACAGTGTCCTG — TGCATGTGACATTTCG-CCATGGGAA 

ii ii ii 111 mi ii 111 ii ii i ii i ii mi 

TGAGTGTGTGACCATAGATCAAGGAGAAGAGCCTGTTGACGTGGATTGTTTTTGCCGGAACGTTGATGGAGT 
610 620 630 640 650 660 670 

160 170 180 190 200 210 220 

ACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGTGCAGAAC- 

I III I III I I I II I I I II 1 I II III I I 1 till II I 

CTATCTG-GAGTACG-GACG— CTGTGGGAAACAGGAAG-GCTCACGGACAAG — GCGCTCAGTGCTGATCC 
680 690 700 710 720 730 740 

230 240 250 260 270 280 290 

— TCCTGTGATAACTGTCAGCCTGGTACTTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGCCCTCCAA 

ill II I I I II III II I II I I I II I III II I 
CATCCCATGCTCAGGGAGAG-CTG-- ACGGGAAGGGGACACAAATGGCTAGAAGG— AGACTCGCTGCGAAC 
750 760 770 780 790 BOO 

300 310 320 330 340 350 

GTACCTT-CTCCAGCATAGG-TG-GACAG-CCGAAC-TGTAACAT--CTGCAGAGTGTGTGCAGGCTATTT 

lllll II II III II I I I llll I II I II I II II I I I I 
ACACCTTACTAGAGTTGAGGGATGGGTCTGGAAGAACAAGCTACTTGCCTTGGCAATG-GTTACCGTTGTGT 
810 820 830 840 850 860 870 

360 370 330 390 400 410 420 

CAGGTT — CAAGAAG-TTTTGCTCCTCTACCCACAACGCGGAGTGTG-AGTGCATTGAAGGATTCCATTGC 

mi i i n i n i in n i n nil i i n i 

— GGTTGACCCTGGAGAGTGTGGT GACCAGGGTTGCCGTTCTTGTTGTGCTCCTGTGTTTGGCACCGG 

380 890 900 910 920 930 940 

430 440 450 460 470 480 

TTGGGGC — CACAGTGCAC-CAGATGTGAAAA — GGACTGCAGGCCTGG — CCAGGAGCTAACGAAG — C 

n n ii inn n n inn inn i nil nil i mi i i 

TTTACGCTTCGCGTTGCACACACTTG-GAAAACAGGGACTTTGTGACTGGTACTCAGGGGACTACGAGGGTC 
950 960 970 980 990 1000 1010 

490 500 510 520 530 540 550 

A — GGGTTGCAAAACCT — GTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGA-CCCTG 

I I III I II II 1 I I II I III I I II II 1 1 1 II I II II 

ACCTTGGTGCTGGAACTGGGTGGATGTGTTAC-TATAA-CAGCTGAGGGGAA— GCCTTCAATGGATGTGTG 
1020 1030 1040 1050 1060 1070 1080 

560 570 580 590 600 610 620 

GACGAACTGCTCTCTAGACGGAAGGTCTGTGCTTAAGAC-CGGGACCACGG— AGAAGGACGTGGTGTGTGG 

i n n i n i i n i in mm II II II I I II I III II 

GCTTGAC-GCCATTTA-CCAGGAGAACCCTGC-TAAGACACGTGAGTACTGTTTACACGCCAAGTTGTCT-G 
1090 1100 1110 1120 1130 1140 1150 


630 640 650 660 670 680 

ACCCCCTG— TG — GTGAGCTTCTCTCCCA— GTACCA-CCATTTCTGTGACTCCAGAGGGAGGACCAGGAG 

n i i n i n i i i i i i nn in n n n n n mm 

ACACTAAGGTTGCAGCCAGATGCCCAACAATGGGACCAGCCA — CTTTGGCT — GA-AGAACACCAGGGT 
1160 1170 1180 1190 1200 1210 

690 700 710 720 730 740 750 

GGCACTCCTTGCA— GGTCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTG-CTGCTGGCCCTGATCTTC 

lllll II I I II II II III 1 II I II II III I I I I 

GGCACAGTGTGTAAGAGAGATCAGAGTGATCGAGGCTGGGGCAACCACTGTGGACTGTTTGGAAAGGGTAGC 
1220 1230 1240 1250 1260 1270 1280 

760 770 780 790 800 810 

ATTACTCTCCTGTTCTCTGTGC-TCAAATGGATCAGGAAAAAATTCCCC—CACATATTCAAG CAACC 

in i nil i n n i n n m n n i nil i i i nil 

ATT-GTGGCCTG-TGTCAAGGCGGCTTGTGAGGCAAAAAAGAAAGCCACAGGACATGTGTACGACGCCAACA 
1290 1300 1310 1320 1330 1340 1350 


ATTTA-AGAAGAC — CACTGGAGCAGCTCAAGAGGAAGATGCT-TGTAGCTGCCGATGTCCACAGGA-AGA 

I II I I li II I II I II II III II III HIM II I II I I 

AAATAGTGTACACGGTCAAAGTCGAACCACACACGGGAGA— CTATGT — TGCCG CAAACGAGACA 

1360 1370 1380 1390 1400 1410 1420 

890 900 910 920 930 940 950 

AGAAGGAGGAGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAGAAGCA 

ii linn ii i ii i i i ii ii ii i i i i i i i i ii i 

CATAGTGGGAGGAAGACGGCAT-CCTTCACAATTTCT-TCAGAGAAAACCATTTTGAC-TATGGGTG-AGTA 
1430 1440 1450 1460 1470 1480 

960 970 980 990 1000 1010 1020 

CTAGGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACCCTGTTCTTACACATCATCCTAGATGA 

ii i i mi i i i ii i ii i i ii i mini ii i 

TGGAGATGTGTCTTTGTTGTGCAGGGTCGCTAG — TGGCGTTGACTTGGCCCAGAC-CGTCATCCTTGA-GC 


1490 

1500 

1510 

1520 

1530 

1540 

1550 

1030 

1040 

1050 

1060 

1070 

1080 

1090 


TGTGTGGGCGCGCACCTCA-TCCAAGTCTCTTCTAAC-GCT-AACATATTTGTCTTTACCTTTTTTAA— AT 

i ii ii mini mi in in n i i in n n n 

T-TG ACAAGACAGTGGAACAC-CTTCCAACGGCTTGGCAGGTCCATAGGGA-CTGGTTCAATGAT 

1560 1570 1580 1590 1600 1610 

1100 1110 1120 1130 1140 1150 1160 

CT-TTTTTTAAATTTAAATTTTA-TGTGTGTGAGTGTTTTGCCTGCCTGTATGCACACGTGTGTGTGTGTGT 

II II II III I I I I 1 I I I I 1 I I II II II I II I 
CTGGCTCTGCCATGGAAACATGAGGGAGCGCAAAACTGGAACAACGCAGAAAG-ACTGGT-TGAATTTG-GG 
1620 1630 1640 1650 1660 1670 1680 

1170 1180 1190 1200 1210 1220 1230 

GTGTGTGACAC— TCCTGATG— CCTG-AGGAGGTC— AGAAGAGAAAGGGTTGGTTCCATAAGAACTGGAG 

I I III II 1111 I II I I II III Hi II I III I III II I 
GCTCCTCACGCTGTCAAGATGGACGTGTACAACCTCGGAGACCAGACTGGAGT-GTTACTGAAG-GCTCTCG 
1690 1700 1710 1720 1730 1740 1750 


1240 1250 1260 1270 1280 1290 1300 

TTATGGATGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTA-TTTTAACGTGA-CTGTA 

I II I Hill II II I II II II I II II llllll III 
CT-GGGGTTCCTGTG-GC — ACACAT-TGAGGGAACCAAGTACCACCTGAAGAGTGGCCACGTGACCTGCG 
1760 1770 1780 1790 1800 1810 1820 

1310 1320 1330 1340 1350 1360 

TAATAAAA AAAAAATG-ATATTTCGGGAATT— GTAGAGATTGTCCTGACACCCTTCTAGTTAATGAT 

i i i inn n i ii n n in i i in inn nil i n 

AAGTGGGACTGGAAAAACTGAAGATGAAAGGTCTTACGTACACAATGT-GTGACA-AAACAAAGTTCA-CAT 
1830 1840 1850 1860 1870 1380 1890 

1370 1380 1390 1400 1410 1420 1430 

CTAAGAGGAATTGTTGATAC-GT AGTATACTGTATATGTGTATGT-ATATGTATATGTATATATAAGA 

inn i i n n i mi n n i n i n i i n i i i in 

GGAAGAGAGCTCCAACAGACAGTGGGCATGATACAGTGGTCATGGAAGTCACAT-TCTCTGGA-ACA-AAG- 


1900 

1910 

1920 

1930 

1940 1950 

1960 

1440 

1450 

1460 

1470 

1480 1490 



CTCTTTTACTGTCAAAGTCAACCTAGAGTGTC-TGG-TTACCAG — GTCAATTTTATTGGACAT TTTA 

i n n n inn n i i in i nil n n in in i i 

CCCT-GTAGGATCCCAGTCAGGGCAGTGGCACATGGATCTCCAGATGTGAA CGTGGCCATGCTGATAA 

1970 1980 1990 2000 2010 2020 

1500 1510 1520 1530 1540 1550 1560 

CGTCACACACACACACACACACACACACACACGT — TT-ATA-CTACGTA-CTGTTATC— GGTATTCTAC 

II II II II III I III I I II II III III III I II II II 
CGCCAAACCCA-ACAATTGAAAACA-ATGGAGGTGGCTTCATAGAGATGCAGCTGCCCCCAGGGGAT--AAC 
2030 2040 2050 2060 2070 2080 2090 

157 A i=;r?n unn u in i ton 


GTCA — TATAATGGG — ATAGGGTAA — AAGGAAACCAAAQAGTGAGTGATATT ATTGTGGAGG — TGACA 

II! Ill III! I Mil II I Hill II III I II I I III I II 
ATCATCTATGTTGGGGAACTGAGTCATCAATGGTTCCAAAAAGGGAG — CAGCATCG-GAAGGGTTTTCCA 
2100 2110 2120 2130 2140 2150 2160 

1630 1640 1650 1660 1670 1680 1690 

GACTACCCCTTCTGGGTACGTAGGGACAGACCTCCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCA 

I III II I II II III II I! I I II I III II II II 

AAAGACCAAGAAAGG— CATA — GAAAGA-CTGACAGTGATAG— GAGAGCACGCCTGGGA— CT--TC- 
2170 2180 2190 2200 2210 2220 

1700 1710 1720 1730 1740 1750 1760 

AGTTCCCGGACGAAGAGGACAGAGGAGACACAGTCCGAAAAGTTATTTTTCCGGCAAATCCTT — TCCCTGT 

III II III I III II I I III I I III Hill I I I 

GGTT-CTGCTGGAGGCTTTCTGAG TTCAATTGGGAAGGCGGTACATACGG TCCTTGGTGGCGCT 

2230 2240 2250 2260 2270 2280 

1770 1780 1790 1800 1810 1820 1830 

TTCGTGACACTCCACCCCTTGTGGACACTTGAGTGT — CA — TCCTTGCGCCGGAAGGTCAGGTGGTACCCG 

III III II I I III I I I II I II III I II III I I 
TTC--AACA — GCATCTTCGGGGGAGTGGGGTTTCTACCAAAACTTTTATTAGGAGTGGCA-TTGG — CTTG 
2290 2300 2310 2320 2330 2340 

1840 1850 1860 1870 1880 1890 1900 X 

TCTGTAGGGGCGGGGA-GACAGAGCCGCGGGGGAGCTACGAGAATCGACTCACAGGGCGCCCCGGGCTTCGC 

II II I I II I I II I I I III I II II I I II 1111 I 
GTTG— GGCCTGAACATGAGAAACCCTACAATG-TCCATGAGCTTTCTCTTGGCTGGAGGTCTGGTCTTCCC 
2350 2360 2370 2380 2390 2400 2410 X 

1910 1920 1930 1940 1950 

AAATGAAACTTTTTTAATCTCACAAGTTTCGTCCGGGCTCGGCGGACCTA 


14. ELLIS-01 2-F IG2AB . SEQ (1-2350) 
035297 ZYMV genone. 


ID 

AC 

DT 

DE 

KM 

KM 

KM 

OS 

FH 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 

FT 


035297 standard; DNA; 9593 BP. 

035297; 

28— MAY— 1993 (first entry) 

ZYMV genone. 

Zuchini yellow nosiac virus; ZYMV; potyvirus; polyprotein; protease; 
proteolytic activity; 49 kD protease; trypsin-like cysteine protease; 
aninal picornavirus! sissile bond; MIb; protein; coat; ss. 

Zuchini yellow nosaic virus. 

Key Location/Qualifiers 

5'UTR 1..139 

/Hag= a 

CDS 140.. 9382 


/#tag= b 

nisc_feature 2437.. 2438 

/*tag= c 

/note= “Cleavage site between aphid transmission 
helper component- (HC) and the 46 kD protein" 
misc_feature 3631.. 3632 

/#tag= d 

/note= "Cleavage site between 46 kD protein and the 
cytoplasmic inclusion protein (CI)“ 
misc_feature 5533.. 5534 

/«tag= e 

/note= "Cleavage site between Cl and VPg/protease (VPg 

and protease are probably not separated in 

ZYMV)" 


nisc_feature 6991.. 6992 
/#tag= f 

/note= "Cleavage site between VPg/protease and RNA 



FT misc.feature 8542.. 8543 

FT /Hag= g 

FT /note= "Cleavage site between REP and the coat 
FT protein (CP) a 

FT misc_feature 9382 

FT /»tag= h 

FT /note= “Polyprotein termination point” 

FT 3'UTR 9383.. 9593 

FT /Hag= i 

PN W09301305-A. 

PD 21-JAN— 1 993 . 

PF 09— JUL— 1992 ; U05745. 

PR 09— JUL- 1 991; US-727837. 

PA (BALI/) BALINT R. 

PI Balint R; 

DR HPI5 93-045506/05. 

DR P-PSDB; R35081 . 

PT Method for identifying protease inhibitors - useful for drugs 
PT screening for treating e.g. chronic inflammation metastatic 
PT cancers and viral infections 
PS Disclosure; Fig 4; 62pp; English. 

CC This sequence represents the nucleotide sequence of the zuchini yellow 

CC mosiac virus (ZYMV) genome. ZYMV is a potyvirus and expresses its 

CC genome as a single 350 kD polyprotein which is cleaved into at least 
CC seven nature gene products by three distinct proteolytic activities. 

CC Two of the proteases are virus encoded? including the potyviral 49 kD 

CC protease. This protease is responsible for at least five of the seven 

CC cleavages. This enzyme is a trypsin-like cysteine protease which is 

CC structurally and mechanistically representative of the largest class 

CC of viral proteases> including those of the animal picornaviruses. 

CC This enzyme is highly specific and appears to recognise a region 
CC comprised of about seven amino acids surrounding the sissile bond. Of 
CC the five sites cleaved by this enzyme? the tuo flanking the protease 
CC appear to be cleaved intramolecularly? while the remaining three 

CC appear to be cleaved intermolcularly. Of the latter three? the site 

CC between the Nib protein and the coat protein appears to be the most 
CC active. The polyprotein sequence encoded by this genome is not 

CC given in the specification but is deduced in R35081. 

SO Sequence 9593 BP; 2995 A; 1844 C; 2258 G; 2496 T; 

Initial Score = 124 Optimized Score = 977 Significance = 6.32 

Residue Identity = 47X Matches = 1213 Mismatches = 969 

Gaps = 353 Conservative Substitutions = 0 


X 10 20 

AT— GTCCATGAACTGCTGAGT 

II III I III Mil I 

GTCAAGGATTTATTCACTTCTGGTGTTGAAACACAGAGCAAGCGAGAAAGATGGGTCTACGAA-AGCTGTGA 
6960 6970 6980 6990 7000 7010 7020 

30 40 50 60 70 80 

GGATAAACAGCACGGGATAT — CTCTGTCTAA— AGGAATA-TTACTACACCAG— GAAAAGGACACAT-T 

i ii i mi i i in si n ii i ii i ii i i ii i ii i i 

AGGGAACC — TTCGGGCTGTTGGAACTGCACAATCAGCGTTAGTCACCAAACATGTTGTGAAAGGCAAGTGT 
7030 7040 7050 7060 7070 7080 7090 

90 100 110 120 130 140 150 

CGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGC-CATGGGAAAC 

i iii ii i i mi i i iii i mi i i ii ii min i 

CCTTTCTTCGAA — GAATAT-TTACAAACACACGCAGAAGCGAGCGCCTATTTCAGACCCCTAATGGGAGAG 


7100 

7110 

7120 

7130 

7140 7150 7160 

160 

170 

180 

190 

200 210 220 


AACTGTTACAACGTGGTGGTCATTG-TGCTGCTGCTAGTGGG CTGTGAGAAGGTGGGAGCCGTGCAGA 

ii i i i ii ii i ii ii i ii ii i i i i mi ii 

TArrAr.rrr.ArraAGTTniarAiiiriinr— rTTTAAAA4rnATTTrTTT&&ATArAATA4Ar.rrr.T-rA— 


7170 7180 7190 7200 7210 7220 7230 

230 240 250 260 270 280 

ACTCCTGTGA — TAACTGTCAGCCTGGT-ACTTTCTG CAGAAAAT — ACAATCC-- AGTCTGCAAGA 

mi i inn i i n i i m ii in ii ii i i ii n 

CTGTTAACCAACTG-GATCATGATAAATTTTTGGGAGCAGTGGATGGGGTTATACGTATGATGTGTGA 

7240 7250 7260 7270 7280 7290 

290 300 310 320 330 340 

GCTGCCCTCCAAGTACCTTC — TCCAGCATAG'GTGGACAGCCGAACTGT — AAC-ATCTGCAGAGTGTGTGC 

I I III I II I II I II I I 1 III I I III I II II II II 

TTTTGAGTTCAACGAATGTCGATTCATTACAGAT — CCCGAGGAAATTTACAACTCTTTGAACA-TGAAAGC 


7300 

7310 

7320 

7330 

7340 

7350 

7360 

350 

360 

370 

380 

390 

400 

410 


AGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTCCA 

i ii iii i i ii ii i i i i ii i i i i mi i i i i mi 

A-GCAATTGGA-GCCCA GTATAG — AGGAAAGAAGAAAGAGTATTTTGAGGGGCTAG-ATGATT — 

7370 7380 7390 7400 7410 7420 

430 440 450 460 470 480 

TTGCTTG-GGGCCAC AGTGCACCAGATGTGAAAAGGACTGC-AGGCCTGGCCAGGAGCTAACGAA-GC 

mi i i ii ii ii ii ii mini i i i i i n i i i n mi 

TTGATCGAGAGCGACTTTTATTCCA-AAGTTGTGAAAGGTTGTTCAATGGCT-ACAAAGGTCTGTGGAATGG 
7430 7440 7450 7460 7470 7480 7490 

490 500 510 520 530 540 

AGGGTT GCAAAAC-CTG-TAGCTTGGGAACATTTAATG— ACCAGAAC-GGTAC — TGGCGTCTGTCG 

i ii iiiiM mu m i ii i i ii in i ii i i i i ii 

ATCTTTAAAGGCCGAGCTCAGGCCGCTTGAGAA-AGTCAGGGCTAACAAAACACGAACCTTTACAGCAG-CG 


7500 

7510 

7520 

7530 

7540 

7550 

7560 

550 

560 

570 

580 

590 

600 

610 


ACCCTGGACGAACTGCTCTCTAGACGGAAGGTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTG 

I I II I mi 1 II 1! I! llll 1 I! Ill I II I I I 

CCAATTGATACATTGCT — TGGAGCTAAAGTTTGTGTGGATGATTTCAACAATG AGTTC-TACAGGA 

7570 7580 7590 7600 7610 7620 7630 

620 630 640 650 660 670 680 

TGGACCCCCTGTGGTGAGCTTCTCTC-CCAGTACCACCATTTCT-GTGACTCCAGAGGGAGGA — CCAGGAG 

III I III I I II II II I III I III I till I I III 

AAAACCTCAAGTGTCCATGGACGGTCGGCATGACAAAATTTTATGGTGGTT GGGATAAATTGATGAG 

7640 7650 7660 7670 7680 7690 

690 700 710 720 730 740 750 

GGCACTCCTTGCAGG-TCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATT 

II I I II III II III II II I II III I I I I I I I I II 

ATCATTACCTGATGGTTGGTTGTATTG-TCATGCTGATG-GATC-ACAGTTCGATAGTTCGTTAACCCCA-G 

7700 7710 7720 7730 7740 7750 7760 

760 770 780 790 800 810 

ACTCTCCTGTTCTCTGTGCTCA-AAT — GG ATCAGGAAAA AATTCCCCC-ACATATTCAA 

ii mil ii mini in ii ii in i n n i i n i 

CCT-TACTGAACGCAGTGCTCATAATCAGGTCATTTTATATGGAGGATTGGTGGGTCGGCCAAGAGATGCTT 
7770 7780 7790 7800 7810 7820 7830 

820 830 840 850 860 870 880 

GCAACCATTTAAGAAGACCACTG-GAGCAGCTCAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAA- 

I 11 1111 I II till lllfl! I I II 1 I I III II II II 

GAAAATCTTTATGCCGA-GATTGTGTACA-CTCCAATTCTTGCTCCTGATGG— AACAATTTTCA-AGAAAT 
7840 7850 7860 7870 7880 7890 7900 

890 900 910 920 930 940 950 

GAAGAAGGAGGAGGAGGAGGCTA — TGAGCTGTGATGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAG 

III I I I II III I III III II I II I I I III III II 

TT AClLfim A A Af* ArTf'f'Tpr A-T AA^ATA^TA - ATf'PTTrTf* ATr-rrT 


7910 7920 7930 7940 7950 7960 

960 970 930 990 1000 1010 

AAGCACTAGGACCCCA — CCATCCTGT-GGAACAGC-ACAAGCAACCCCACCACCCTG — TTC-TTACACAT 

i mi i i! ii ii iiiii n i n i i i n in n n n 

ATTTACTATGCGTGCATGAAATTTGGTTGGAACTGCGAGGAGATTGAGAATAAACTTGTCTTCTTTGCAAAT 
7970 7980 7990 8000 8010 8020 8030 8040 

1020 1030 1040 1050 1060 1070 1080 

CATCCTAGATGATGTGTGGGCGCGCACCTCATCCA — AGTCT — CTTCTAAC — GCTAACATATTTGTCTTT 

mini ii i m m i ii i i ii n i iiiii mi i 

GGAGATGATCTG-ATACTTGCA-GTCAAAGATGAGGATAGCGGCTTACTTGATAACA TGTCATC 

8050 8060 8070 8080 8090 8100 

1090 1100 1110 1120 1130 1140 

ACCTTTTTTAAATCTTTTTTTAAATTTAAATTTT ATGTGTG TGAG-TGTT-TTGCCTGCCT 

mm i m i mi iiiii i ii i m i i ii mi 

CTCTTTTTGCGAACTTGGACTGAATTATGATTTTTCAGAACGTACGCATAAAAGAGAAGATCTTTGGTTCAT 
8110 8120 8130 8140 8150 8160 8170 

1150 1160 1170 1180 1190 1200 1210 

GTATGCACACG — TGTGTGTGTGTGTGTGTGTGACACTCCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGT 

ii ii i ii ii ii i m mini i i m i m iiiii i 

GTCCCACCAAGCAATGCTAGT-TGATGGAATGT-ACACTCC-AAAACTCGAGAA AGA-GAGAA T 

8180 8190 8200 8210 8220 8230 


1220 1230 1240 1250 1260 1270 

TGGTTCCAT--AAGAACTGGAGTTATGGATGG — CTGTGAGCCGGNNNGA — TAGGT-CGGGACGGAGA- 

II III II III III I I I I I II III II II I II || || 
TGTTTCAATTCTAGAGTGGGATAGAAGCAAAGAAATTATGCACCGAACAGAGGCTATTTGCGCTGCGATGAT 
8240 8250 8260 8270 8280 8290 8300 

1280 1290 1300 1310 1320 1330 1340 

CCTGTCTTCTTATTTTAACGTGACTGTATAATAAA-AAAAAAATGATATTTCGGGAATTGTAGAGATTGTCC 
III I II I I II III I III I II I II I INI I I 
TGAGGCATGGGGGCACACCGAGCTCTTGCAAGAAATCAGAAAGTTTTACCTATGG-TTCGTTGAAAAAG--A 
8310 8320 8330 8340 8350 8360 8370 

1350 1360 1370 1380 1390 1400 1410 

TGACACCCTTCTAGTTAATGATCTAAGAGGAATTGTTGATACGTAG— TATACTGTATATGTGTATG-TATA 

II I II I II II II I I II I III I II I I I III I 1111 
AGAGGTGCGAGAATTGGCAGCCCTCGGA— AAAGCTCCATACATAGCTGAGACAGCA-CTTCGTAAGTTATA 
8380 8390 8400 8410 8420 8430 8440 

1420 1430 1440 1450 1460 1470 1480 

— TGTATATGTA-TATATAAGACTCTTTTACTGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATT 

II I I I I II II I 1111 1 I I IIIII I II I I III II 

CACTGACAAGGGAGCAGAT — ACAAGTGAACTGGC — ACGCTACCTACAAGCCCTCCAT-CAAGAT — ATC 


8450 8460 

8470 

8480 

8490 

8500 


1490 1500 

1510 

1520 

1530 

1540 

1550 


TTATTGGA-CATTTTACGTCACACACACACACACACACACACACACACGTTTATACTACGTACTGTTATCGG 

II II II II II 1111 I I I II II III I I III I III 

TTCTTTGAGCA AGGAGACACTGTGATGCTC-CAATCAGGCACTCAGCCA-ACTGTGGCAGATGCTGGA 

8510 8520 8530 8540 8550 8560 8570 

1560 1570 1580 1590 1600 1610 1620 

TATTCTACGTCATATAATGGGATAGGGTAAAAGG-AAACCAAAGA-GTGAGTGATATTATTGTGGAGGTGA- 

IIIII 1111 till III II 111 III II II I I I I llllll 
GCTACAAAGAAAGATAA-AGAAGATGACAAAGGGAAAAACAAGGACGTTA — CAGGCTCCGGCTCAGGTGAG 
8580 8590 8600 8610 8620 8630 8640 

1630 1640 1650 1660 1670 1680 

CAGACTACCCCTTCTGGGTACGTAGGGACA--GACCT-CCTTCGGACTGTC T AAAACTCCCCTT AGAA 

I II I II I 111 I IIIII II I II I I II 1111 I I I 

AAAACAGTAr.rinrT-nTrirn-aicr.iraA^r.ATriTCAfiTcrTnnTTrTrATnncaiiaTTnTcrrnrn-T 



8650 8660 8670 8680 8690 8700 8710 

1690 1700 1710 1720 1730 1740 1750 

GTCTCGTCAAGTTCCCGGACGAAGAGGACAGAGGAQACACAGTCCGAAAAGTTATTT — TTC-CGGCA — AA 

i iii iii it i i iii i i ii i ii i ii mi i n i i i ii i i 

CTTTCG — AAGATCAC-AAAGAAAATGTCA-TTGCCACGC-GT-- GAAAGGAAATGTGATACTCGATATTGA 
8720 8730 8740 8750 8760 8770 

1760 1770 1780 1790 1800 1810 

TCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTG GACACTTGAGTGTCATCCTTGCGCCGGAAGG 

ii iii iii i i i i iii i mi in in n i i i i i 

TCATTT-GCTGGAATATAAACCGGATCAAATTGAGTTATATAACACACGAGCGTC-TCAT— CAGCAGTTCG 
8780 8790 8800 8810 8820 8830 8840 

1820 1830 1840 1850 1860 1870 1880 

TCAGGTGGT — ACC-CGTCTGTAGGGGCGGGGA — GACAGAGCCGCGG— GGGAGCTACGAGAATCGACT 

i mi in n i n n n nil i i nm i i i i i 

CCTCTTGGTTCAACCAGGTTAAGACGGAATATGATTTGAACGAGCAACAGATGGGAGTTGTAATGAATG-GT 
8850 8860 8870 8880 8890 8900 8910 


1890 1900 1910 1920 1930 1940 1950 

CACAGGGCGCCCCGGGCTTCGCAAATGAAACTTTTTTAATCTCACA-AGTTTCGTCCGGGCT-CGGCGGAC 

n i i i n i i mi n n i i in ii n in i i mi 

TTCATG— GTTTGGTGCATTGAGAATGGCAC TTCACCCGACATTAATGGAGTGTGGGTTATGATGGAC 

8920 8930 8940 8950 8960 8970 8980 

1960 1970 1980 1990 2000 2010 2020 

CTATGGCGTCGATCCTTATTACCTTATCC — TGGCGCCAAGATAAAACAACCAAAAGCCTTGACTCCGG-TA 

n n i ii inn n nil i i n mm n in i 

GGAAATGAGC — AAGTTGAGTATCCCTTGAAACCAATAGTTGAAAATGCAAAGCCAACGCTGCGGCAA 

8990 9000 9010 9020 9030 9040 

2030 2040 2050 2060 2070 2080 

CTAATTC-TC— CCTGCCG-GCCCCCGTAAGCATAACGCGGCGATCTCCACTTTAAGAACCTGGCCGCGTTC 

mi i n in i n i mi i i in ii in in mi 

ATAATGCATCATTTTTCAGATGCAGCGGAGGCAT-ATATAGAGAT-GAGAAATGCAGA GGCACCATAC 

9050 9060 9070 9080 9090 9100 9110 

2090 2100 2110 2120 2130 2140 2150 

-TGCC-TGGTCTCGCTTTCGTAAACGGTTCTTACAAAAGTAATTAGTTCTTGCTTTCAGC-CTCCAAGCTTC 

mi in i i n i i n i in i ii n i n in i i nil 

ATGCCGAGGTATGGTTTGCTT — CGAAACCTAC GGGAT-AG GAGTTTAGCACGATATGCTTT 

9120 9130 9140 9150 9160 9170 

2160 2170 2180 2190 2200 2210 2220 

TGCTAGTCTATGGCAGCA — TC-AAGGCTGGTATTTGCTACGGCTGA-CCGCTACGCCGCCGCAATAAG-GG 

n i linn n n n n i i i n i i i i in i i n n i 

TGAT-TTCTATGAAGTCAATTCTAAAACTCCTGAAAGAGCCCGCGAAGCTGTTGCGCAGATGAAAGCAGCAG 
9180 9190 9200 9210 9220 9230 9240 

2230 2240 2250 2260 2270 2280 2290 

TACTGGGC — GGCCCGTCGAAGGCCCTTTGG-TTTCA— GAAACCCAAGGCCCCCCTCATACCAACGTTTC 

n n i i n mi inn n i mi in n n i 11 

CTCTTAGCAATGTTTCTTC-AAGGTTGTTTGGCCTTGATGGAAA — TGTTGCCACC ACTAGCG — AA 

9250 9260 9270 9280 9290 9300 

2300 2310 2320 2330 2340 2350 

GACTTTGATTCTTGC-CGGTACGTGGTG — GTGGGTGCCT-TAGCTCTTTCTCGATAGTTAGAC 

III III III Hill II 1 I I I I III II 1 1 I II 
GACACTGAACGGCACACTGCACGTGATGTTAATAGAAACATGCACACCTTACTAGGTGTGAATACAATGCAG 
9310 9320 9330 9340 9350 9360 9370 X 

TAAAGGGTAGGCCGCCTACCTAGGTTATTGTTTCGCTGCCGAC 
9380 9390 9400 9410 9420 



15. ELLIS-Q12-FIG2AB.SEQ ( 1-2350) 

029860 Odorant receptor clone 17. 

ID Q29860 standard; DNA; 983 BP. 

AC 929860; 

DT 15-MAR- 1993 (first entry) 

DE Odorant receptor clone 17. 

KM Odorant receptor; insect; vertebrate; fish; mammal; neurotransmitter; 

KM hormone; G'-protein; surface receptor; olfactory epithelium; PCR; 

KM Sprague-Dauley rat; amplify.’ primer; polymerase chain reaction; 

KM multigene family; ligand binding domain! ss. 

OS Ratus ratus. 

PN M09217585-A. 

PD 15-0CT-1992. 

PF 06-APR-1992; U02741. 

PR 05-APR-1991 ; US-681880. 

PA (UYCO ) UNIV COLUMBIA MEM YORK. 

PI Axel Ri Buck LB; 

DR MPI! 92-366257/44. 

DR P-PSDB; R27872. 

PT Nucleic acid encoding an odorant receptor - can be used to 

PT control insect populations or for detecting odours e.g, alcohol. 

PT explosives i natural gas etc. 

PS Claim 9; Fig 14; 195pp; English. 

CC The sequences given in 929855-77 are odorant receptor clones derived 
CC from an insect; a vertebrate. a fish or a mammal. These clones form 
CC a family of neurotransmitters and hormone receptors which transduce 
CC intracellular signals by activation of specific G-proteins. Each 

CC of these receptors is a member of a superfamily of surface receptors 

CC uhich traverse the membrane seven times. These clones are only 

CC expressed in the olfactory epithelium. These clones were isolated 

CC using probes derived from RNA prepared from the olfactory epithelia 

CC of Sprague-Dauley rats. Isolated cDNA's were amplified using primers 
CC uhich correspond to transmembrane domain 2 and 7. PCR products of the 

CC appropriate size were isolated and sequenced. The deduced protein 

CC sequences of these cDNA’s defined a neu multigene family uhich shared 
CC sequence and structural properties uith the superfamily of 
CC neurotransmitter and hormone receptors uhich traverse the membrane 
CC seven times. This novel family, houever exhibits features different 

CC from any other member of the superfamily identified so far. There is a 

CC striking divergence within the third, fourth and fifth transmembrane 
CC domains between the olfactory proteins. This divergence in the 

CC potential ligand binding domain is consistent uith the idea that 

CC the family of molecules cloned is capable of asssociating uith a large 
CC number of odorant of diverse molecular structure. 

SG Sequence 983 BP; 206 A; 270 C; 214 G; 293 T; 


Initial Score = 

123 

Optimized Score = 393 

Significance = 

6.26 

Residue Identity = 

497. 

Matches = 487 

Mismatches = 

361 

Gaps = 

137 

Conservative Substitutions 

= 

0 


X 10 20 

ATGT CCATGAACTGCTGAGTGG 

i I I III II I 

CAGTGGGAGAGTGAGTGAATTTGTGTTGCTGGGTTTCCCAGCTCCTGCCCCACTGCGAGTACTACTATTTTT 
20 30 40 50 60 70 80 

30 40 50 60 70 80 90 

ATAAACAGCACGGGATATCTCT — GT-CTAAAGGAATA-TTACTACACCAGGAAAAGGACACATTCGACAAC 

I I II III ! I II ! ! Ill I I II II II I II I III I I I 

CCTTTCTCTTCTGGCTATGTGTTGGTGTTGACTGAAAACATGCT-CATCA-TTATAGCA — ATTAGGAACC 
90 100 110 120 130 140 150 

100 110 120 130 140 150 

AGGAAAGGAGCCTGTCACAGAAAACCA — CAGT-GTCCTGTGC — ATGTGACATTTC — GCCAT — GGGAAA 
! I 1 II l ill 1 1 1 1 1 1 it III n 1 1 1 1 n 1 1 1 1 i ii iii 



ACCCA — ACCCT-CCAC — AAACCCATGTATTTTTTCTTGGCTAATATGTCATTTCTGGAGATTTG6TATG 
160 170 180 170 200 210 220 

160 170 180 190 200 210 220 

CAACTGTTACAA— CGT — GGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGTGCAGA 

ilium i i i i ii ii ii iii i i ii i mu m i i i n 

TCACTGTTACGATTCCTAAGATGCTCGCTG-GCTTC-ATTGGT-T CCAAGGAGAACCATQGA-CAG — CTGA 
230 240 250 260 270 280 

230 240 250 260 270 280 

ACTCCTGTG ATAACTGTCAGCCTGGTACTTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGC — C 

mu n ii ii ii mm i i i i in i i n i 

TCTCCTTTGAGGCATGCATGACACAACTCTACTTTTTCCTGGGCTTGGGTTGCACAGAGTGTGTCCTTCTTG 
290 300 310 320 330 340 350 360 

290 300 310 320 330 340 350 

CTCCAAGTACCTTCTCCAGCATAGGTGGACAGCCG AACTGTA— ACATCTGCAGAGTGTGT-GCAGG 

ii i m i ii ii mi iii in i n i ii ii n ii n 

CTGTGATGGCCTATGACCGC-TATGTGGCTATCTGTCATCCACTCCACTACCCCGTCATTGTCAGTAGCCGG 
370 380 390 400 410 420 430 

360 370 380 390 400 410 420 

CTATTTCAGGTTCAAGAAGTTTTGCTCCTCTAC-CCACAACGCGGAGTGTGAGTGCATTGAAGGATTCC — A 

mi ii i n i i mi ii n it n n iii mi i 

CTATGTGTGCAGATGGCAG-CTGGATCCTGGGCTGGAGGTTTTGGTATCTCCATG-GTTAAAGTTTTCCTTA 
440 450 460 470 480 490 500 

430 440 450 460 470 480 

TTGCTTG — GGGC-CACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGG 

II II I 1 I II III II 1 I I II 11 I I II III I 

TTTCTCGCCTGTCTTACTGTGGCCCCAACACCATCA — AC — CA — CTTTTTCTG TGATGTGTCTCCA 

510 520 530 540 550 560 

490 500 510 520 530 540 550 

TTGCAAAACCTGTAGCTTGGGAACATTTA-ATGAC — CAGAACGG — TACTGGCGTCTGTCGACCCTGGACG 

mi 1 1 1 n 1 1 i n n i i in i m n i in i i i iii mu i 

TTGCTCAACCTGT — CATG CACTGACATGTCCACAGCACAGCTTACAGAC-TTTGT CCTGG-CC 

570 580 590 600 610 620 

560 570 580 590 600 610 620 

AACTGCTCTCTAGACGGA-AGGTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGT GGTG-TGTGGA 

i i m m i in n ii i in i i i i ii i i mi mi 

ATTTTTATTCTGCTGGGACCGCTCTCTG-TCA — CTGGGGCATCCTACATGGCCATCACAGGTGCTGTGAT 
630 640 650 660 670 680 690 

630 640 650 660 670 680 

CCCCCTGTGGTGAGCTTCTCTCC-CAGTACCACCATTT CTGTGACTCCAGAGGGAGGACCAG--GAGG 

iii i mi n n i n n in inn mi i i i i i 

GCGCATCCCCTCAGCTGCTGGCCGCCATAAAGCCTTTTCAACCTGTGCCTCCCACCTCA CTGTTGTGA 


700 710 

720 

730 

740 

750 


690 700 

710 

720 

730 

740 

750 


GCA-CTCCT-TGCAG — GTCCTTACCT-TGTTCCTGGCGCTGACATCGGCTTTGCTGCTGGCCCTGATCTT 

n n n inn n n i i i i n m n i n in i i i i 

TCATCTTCTATGCAGCCAGTATTTTCATCTATGCCAGGC-CTAAGGCACTCTCAGCTTTAGACACCAACAAG 
760 770 780 790 800 810 820 830 

760 770 780 790 800 810 820 

CATTACTCTCCTGTTCTCTGTGCT-CAAATGGATCAGGAAAAAATTCCCCCACATA-TTCAAG— CAACCAT 

i i in mi mi m inn in ii ii n i i i inn 

C--TGGTCT-CTGTACTCTACGCTGTC-ATTGTACCGTTGTTCAATCCCATCATCTACTGCTTGCGCAACC-- 
840 850 860 870 880 890 

830 840 850 860 870 880 890 

TTAAGAAGACCACTGGAGCAGCTCAAGAGGAAGATGCTTGTAGCT-GCCGATGTCCACAGGAAG — AAGAAG 

i i i i i ii 


1 1 1 1 mi i i 


i 


i ii i i i ii mi i i i 


1 1 1 1 1 i 


1 1 i 



— AAGATG-TCAAAAGAGC-GCT-ACGTCG — CACGC-TGCACCTGGCCCAGGAC — CAGGAGGCCAATACC 
900 910 920 930 940 950 

900 910 920 930 940 950 960 

GAGGAGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAGAAGCACTAGG 

I I III II I I II I 

AACAA— AGGCAGC-AAAATTGGTTAG 
960 970 980 X 


ACCCC 



> 0 < 

0| |0 IntelliGenetics 

> 0 < 

FastDB - Fast Pairwise Comparison of Sequences 
Release 5.4 

Results file ellis-0l2-fig2ab-pir.res made by shears on Tue 14 Sep 93 15:01:23-PDT. 


Query sequence being compared:ELLIS-012-FIG2AB.PEP (1-256) 

Number of sequences searched: 52257 

Number of scores above cutoff: 4100 

Results of the initial comparison of ELLIS-012-FIG2AB.PEP (1-256) with: 
Data bank : PIR 36, all entries 

100000 - 

N 

U50000- 
M -# 

B 

E 

R 

0 

F 10000- 

s 

E 5000- 

S 

U 

E 

N 

C 

E 

S 1000- 


500- 


- # 


100 - 


50 - 


10 - 


5- 


SCORE 0 | | | | 28 57 85 114 142 171 199 228 256 

STDEV 8 


PARAMETERS 


Similarity matrix 

Unitary 

K-tuple 

2 

Mismatch penalty 

5 

Joining penalty 

30 

Gap penalty 

1.00 

Window size 

32 

Gap size penalty 

0.26 



Cutoff score 

0 



Randomization group 

0 



Initial scores to save 40 

Alignments to save 

15 

Optimized scores to 

save 0 

Display context 

50 


SEARCH STATISTICS 


Scores: 

Mean 

Median Standard 

Deviation 


4 

5 1.51 


Times: 

CPU 

Total Elapsed 


00:03:05.07 

00:06:20. 

.00 

Number of residues: 


15485766 


Number of sequences 

searched: 

52257 


Number of scores above cutoff: 

4100 



Cut-off raised to 4. 

Cut-off raised to 5. 

Cut-off raised to 6. 

Cut-off raised to 7. 

The scores belou are sorted by initial score. 

Significance is calculated based on initial score. 

A 100% identical sequence to the query sequence was found; 

Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 


1. B32393 4-1 BB protein precursor - Mou 256 256 256 166.38 0 

The list of other best scores is: 

Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 


»#§# 4 standard deviations above mean 


2 . 

S15785 

Heat-stable antigen HSA-C 

- M 

141 

11 

26 

4.62 

0 

3. 

A39046 

Tissue factor precursor - 

Hou 

294 

11 

42 

4.62 

0 

4. 

A32318 

Tissue factor precursor - 

Mou 

294 

11 

42 

4.62 

0 


3 standard deviations above mean 


6. 

S01877 

NADH dehydrogenase (ubiquinon 

59 

10 

14 

3.96 

0 

7. 

S15784 

Heat-stable antigen - House 

76 

10 

19 

3.96 

0 

8 . 

A43537 

Heat stable antigen M1/69-J11 

76 

10 

19 

3.96 

0 

9. 

S21969 

19K zein precursor (clone ZG3 

214 

10 

19 

3.96 

0 

10. 

ZIZMA2 

19K zein precursor (clone cZl 

230 

10 

23 

3.96 

0 

11. 

S03417 

19K zein precursor (clone gZl 

234 

10 

23 

3.96 

0 

12. 

S21970 

19K zein precursor (clone A30 

234 

10 

23 

3.96 

0 

13. 

ZIZHB1 

19K zein precursor (clone cZl 

234 

10 

23 

3.96 

0 

14. 

ZIZH3 

19K zein precursor (clone A30 

234 

10 

23 

3.96 

0 

15. 

S15655 

Zein. 19K - Maize 

235 

10 

23 

3.96 

0 

16. 

ZIZN99 

19K zein precursor (clone ZG9 

235 

10 

22 

3.96 

0 

17. 

S07172 

19K zein precursor (clone Z4) 

267 

10 

21 

3.96 

0 

18. 

BWNSV4 

Mov-34 protein - House 

321 

10 

38 

3.96 

0 

19. 

S27672 

0-antigen polymerase - Salmon 

359 

10 

21 

3.96 

0 

20. 

A32118 

H+-transporting ATP synthase 

465 

10 

39 

3.96 

0 

21. 

S01292 

Tenascin - Chicken (fragment) 

697 

10 

36 

3.96 

0 

22. 

C33379 

Protenascin 190K precursor - 

1535 

10 

36 

3.96 

0 

23. 

B32230 

Cytotactin precursor 2 - Chic 

1537 

10 

36 

3.96 

0 

24. 

B33379 

Protenascin 200K precursor - 

1626 

10 

36 

3.96 

0 

25. 

A30903 

Protenascin precursor - Chick 

1808 

10 

36 

3.96 

0 

26. 

A33379 

Protenascin 230K precursor - 

1808 

10 

36 

3.96 

0 

27. 

A32230 

Cytotactin precursor - Chicke 

1810 

10 

36 

3.96 

0 

28. 

B39079 

Pre-alpha-inhibitor HC3 chain 

18 

9 

9 

3.30 

0 

29. 

C34245 

Inter-alpha-trypsin inhibitor 

20 

9 

9 

3.30 

0 

30. 

B25604 

Endothelial cell growth facto 

49 

9 

9 

3.30 

0 

31. 

D31201 

GLI-related finger protein HK 

106 

9 

23 

3.30 

0 

32. 

S 1 2586 

Whey acidic protein - Rabbit 

127 

9 

16 

3.30 

0 

33. 

501286 

Whey acidic protein precursor 

127 

9 

16 

3.30 

0 

34. 

S03552 

Inter-alpha-trypsin inhibitor 

147 

9 

9 

3.30 

0 

35. 

B30020 

Hypothetical protein 6 - Frui 

174 

9 

15 

3.30 

0 

36. 

SOI 189 

NADH dehydrogenase (ubiquinon 

174 

9 

16 

3.30 

0 

37. 

S19934 

Hypothetical protein - Escher 

196 

9 

28 

3.30 

0 

38. 

A42337 

submandibular gland protein A 

206 

9 

30 

3.30 

0 

39. 

A25303 

Alpha-l-microglobulin precurs 

220 

9 

34 

3.30 

0 

40. 

TVHST2 

Transforming protein (int-2> 

245 

9 

34 

3.30 

0 


1. ELLIS-012— FIG2AB.PEP (1-256) 

B32393 4- IBB protein precursor - House 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 


B32393 #Type Protein 
4- IBB protein precursor - House 
1 7- Jul -1 992 SSequence 17-Jul-1992 #Text 23-Har-1993 
0.0 0.0 0.0 0.0 0.0 
Hus musculus OComnon-name house mouse 
B32393 


^Authors Kuon B.S.» Weissman S.M. 

IJournal Proc. Natl. Acad. Sci. U.S.A. (1989) 86:1963-1967 
#Title cDNA sequence of two inducible T-cell genes. 

#Reference-nunber A32393; MUID: 89184547 


SAccession B32393 


#Molecule-type mRNA 
SResidues 1-256 <KW0> 


#Cross-reference GB*'J04492 
FEATURE 

1-23 IDomain signal sequence (predicted) 

<SIG>\ 

24-256 OProtein 4-1BB protein <MAT> 

SUHMARY SMolecular-ueight 27598 SLength 256 ((Checksum 4884 
SEQUENCE 


Initial Score 
Residue Identity 
Gaps 


256 Optimized Score = 256 Significance = 166.38 

1007. Matches = 256 Mismatches = 0 

0 Conservative Substitutions = 0 



X 10 20 30 40 50 60 70 

MGNNCVNVVV X VLLLVGCEKVGAVGNSCDNCQPGTFCRKYNPVCKSCPPSTFSS I GGQPNCN I CRVCAGYFR 

mini imiimi! mm mmiiimi in iii mum Mini liiiiiimiiiM 

HGNNCYNVVVIVLLLVGCEKVGAVGNSCDNCGPGTFCRKYNPVCKSCPPSTFSSIGGGPNCNICRVCAGYFR 
X 10 20 30 40 50 60 70 


80 90 100 110 120 130 140 

FKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPG6ELTKQGCKTCSLGTFNDQNGTGVCRPWTNCSLDGR 

uuuuuuuuuunuunuuunnuuunuumuuuununmu 

FKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTX9GCKTCSLGTFNDQNGTGVCRPWTNCSLDGR 
80 90 100 110 120 130 140 

150 160 170 180 190 200 210 

SVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLGVLTLFLALTSALLLALIFITLLFSVLKHIRK 
II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 I 1 1 I U i II II I I 1 1 1 1 1 1 1 1 1 1 1 U M 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
SVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLGVLTLFLALTSALLLALIFITLLFSVLKHIRK 
150 160 170 180 190 200 210 

220 230 240 250 X 

KFPHIFK8PFKKTTGAA8EEDACSCRCPGEEEGGGGGYEL 

M 1 1 1 1 1 I II U I 1 1 1 1 1 II 1 1 II 1 1 ! I! I U I ! I II 1 1 1 
KFPHIFKGPFKKTTGAA9EEDACSCRCPGEEEGGGGGYEL 
220 230 240 250 X 


2. ELLIS-012-FIG2AB.PEP (1-256) 

S15785 Heat-stable antigen HSA-C - Mouse 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 


S15785 #Type Protein 
Heat-stable antigen HSA-C - Mouse 
07-Apr-1992 ^Sequence 07-Apr-1992 GText 07-Apr-1992 
0.0 0.0 0.0 0.0 0.0 
Mus musculus OComRon-name house aouse 
S15785 


GAuthors Henger R.H.. Ayane M.i Bose R.. Koehler G.> Nielsen 

P.J. 

IJournal Eur. J. Immunol. (1991) 21:1039-1046 
STitle The genes for a nouse hematopoietic differentiation 
marker called the heat-stable antigen. 
IReference-number S15783) MUID :912093S0 


lAccession S15785 


GStatus preliminary 

^Residues 1-141 <HEN> 


SCros s-reference EMBL : X56486 

SUMMARY GNolecular-ueight 15515 DLength 141 GChecksun 6244 
SEQUENCE 


Initial Score = 11 Optimized Score = 26 Significance = 4.62 

Residue Identity = 22X Matches = 30 Mismatches = 93 

Gaps = 9 Conservative Substitutions = 0 

90 100 110 120 130 X 140 150 

NAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNBQNGTGVCRPWTNCSLDGRSVLKTGTTE 

I I I 

MGRAMVARLGLGLLLLALLLPT 
X 10 20 


160 170 180 190 200 210 

KDVVCGPPVVSFSPSTTISVTP EGGPGGHSLQVLTLFLAL — TSALLLALIFITLLFSVLKWIRKKF 

I I! Ill II III III II II II I 

91 YCNQTSVAPFSGNQNISASPNPSNATTRGGGSSLQSTAGLLALSSTSLLLETQARKRLYFPIFYTYPKWQ 
30 40 50 60 70 80 90 

220 230 240 250 X 

DUTCKADi:ytfTTCAAfl’:rnArcrDr* , OArcc:rrrrr-vri 



P — a VQCDfiEETGPPR I VC YHTSTENTENSKFDG I KGR VKGLREERCR Y 
100 110 120 130 140 


3. ELLIS-012-FIG2AB.PEP (1-256) 

A39046 Tissue* factor precursor - Mouse 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

^Authors 


A39046 SType Protein 
Tissue factor precursor - Mouse 
31 -Jul— 1991 ^Sequence 31-Jul-1991 IText 23-Mar-1993 
0.0 0.0 0.0 0.0 0.0 
Mus musculus #Cc<mnon-nane house nouse 
A39046 


Ranganathan G.> Blatti S.P.» Subramanian M.» Fass 
D.N. . Maihle N.J., Getz M.J. 

SJournal J. Biol. Chen. (1991) 266:496-501 

#Title Cloning of nurine tissue factor and regulation of 

gene expression by transforming growth factor type 

betal . 

#Reference-number A39046; HUI D : 9 1093 1 7 1 
^Accession A39046 

IStatus preliminary 

SMolecule-type mRNA 
^Residues 1-294 <RAN> 

SCross-reference GB:J05713 

SUMMARY SMolecular-ueight 32935 #Length 294 Checksum 8911 

SEQUENCE 


Initial 

Score = 

11 

Optimized Score = 42 

Significance = 4.62 

Residue 

Identity = 

23X 

Hatches = 58 

Mismatches = 165 

Gaps 

= 

28 

Conservative Substitutions 

= 0 


X 10 20 

MGNNCYNVWIVLLLVGCEK-V 


VRPRLLAALAPTFLGCLLLQVIAGAGIPEKAFNLTWISTDFKTILEUQPKPTNYTYTVQISDRSRNWKNKCF 
10 20 30 40 50 X 60 70 


30 40 50 60 70 80 90 

GAVQNSCDNCQPGTFCRKYNPVCK-SCPPSTFSSIGGQPNCNICRVCAGYFRFKKF-CSSTHNAECECIEGF 

II I I I I I II I I I 

STTDTECDLTDEIVKDVTWAYEAKVLSVPRRNSVHGDGDQLVIHGEEPPFTNAPKFLPYRDTNLGQPVIQQF 
80 90 100 110 120 130 140 

100 110 120 130 140 150 

HCLGPQCTRCEKDCRPG8ELTKQGCKTCSLGTFNDQNG — TGVCRPWTNCSLDGRSVLKTGTTE — KDVVCG 

I II II I I I I I I I I I II I 

EQDGRKLNVVVKD SLT-LVRKNGTFLTLRQVFGKDLGYIITYRKGSSTGKKTNITNTNEFSIDVEEG 

150 160 170 180 190 200 210 

160 170 180 190 200 210 220 

PPVVSFSPSTT I S-VTPEGGPGGHSLQVLT LFLALT SALLLAL IFITLLFSVLKWI RKKFPHI 

i i i it i i i iii m hi it i ii 

VSYCFFVQAMI FSRKTNQNSPG — SSTVCTEQVJKSFLGETLI I VGAWLLAT IF I ILLSI SLCKRRK — NR 
220 230 240 250 260 270 280 

230 X 240 250 

FKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 


AGQKGKNTPSRLA 
290 X 


Tissue Factor precursor - Mouse 

A32318 #Type Protein 
Tissue Factor precursor - Mouse 
£9-330-1970 iSequence 29-Jan-1990 ((Text 23-Mar-1993 
0.0 0.0 0.0 0.0 0.0 
Mus nusculus ((Common-name house mouse 
A32318 

Hartzell S.* Ryder K. t Lanahan A.> Lau L.F., Nathans 
D. 

Mol. Cell. Biol. (1989) 9:2567-2573 
A growth Factor-responsive gene oF murine BALB/c 3T3 
cells encodes a protein homologous to human tissue 
Factor. 

SReFerence-number A32318; MUI D ; 89343974 
((Accession A32318 

((Status preliminary 

#Molecule-type mRNA 
((Residues 1-294 <HAR> 

((Comment This sequence has not been compared to the 

nucleotide translation. 

SUMMARY SMolecular-ueight 32923 ((Length 294 Checksum 9197 
SEQUENCE 


Initial 

Score = 

11 

Optimized Score = 42 

SigniFicance = 

4.62 

Residue 

Identity = 

23X 

Matches = 58 

Mismatches = 

165 

Gaps 

= 

28 

Conservative Substitutions 

= 

0 


X 10 20 

MGNNCYNVVVIVLLLVGCEK-V 

III I 

VRPRLLAALAPTFLGCLLLQVTAGAGIPEKAFNLTWISTDFKTILEHQPKPTNYTYTVQISDRSRNWKNKCF 
10 20 30 40 50 X 60 70 

30 40 50 60 70 80 90 

GAVQNSCDNC9PGTFCRKYNPVCK-SCPPSTFSSIGGQPNCNICRVCAGYFRFKKF-CSSTHNAECECIEGF 

II I I I I I II I I I 

STTDTECDLTDEIVKDVTWAYEAKVLSVPRRNSVHGDGD9LVIHGEEPPFTNAPKFLPYRDTNLG0PVIQQF 
80 90 100 110 120 130 140 

100 110 120 130 140 150 

HCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNG — TGVCRPWTNCSLDGRSVLKTGTTE — KDVVCG 

I II II I I I I I I I I I II I 

EQDGRKLNVVVKD SLT-LVRKNGTFLTLRQVFGKDLGYIITYRKGSSTGKKTNITNTNEFSIDVEEG 

150 160 170 180 190 200 210 

160 170 180 190 200 210 220 

PPVVSFSPSTTIS-VTPEGGPGGHSLQVLT LFLALT SALLLALIF ITLLFSVLKWIRKKFPHI 

I I I II I I I II I III III II I II 

VSYCFFVQAMIFSRKTNQNSPG — SSTVCTEQWKSFLGETL 1 1 VGAVVLLAT IF I ILL5I SLCKRRK — NR 
220 230 240 250 260 270 280 

230 X 240 250 

FKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

I I I 

AGQKGKNTPSRLA 
290 X 


5. ELL 1S-0 1 2-F IG2AB . PEP (1-256) 

S15783 Heat-stable antigen precursor - Mouse 


A32318 

ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

((Authors 

((Journal 

((Title 


ENTRY 

TITLE 


S15783 #Type Protein 
Heat-stable antigen precursor - Mouse 


PLACEMENT 0.0 0.0 0.0 0.0 0.0 

SOURCE Mus nusculus #Connon-nane house nouse 

ACCESSION S15783 

REFERENCE 


#Authors Wenger R.H.i Ayane M.. Bose R.» Koehler G.» Nielsen 
P.J. 

#Journal Eur. J. Innunol. (1991) £1:1039-1046 

#Ti tie The genes for a nouse henatopoietic differentiation 

narker called the heat-stable antigen. 
IReference-nunber S15783; MUID : 91209380 
iAccession S15783 

IStatus prelininary 

^Residues 1-45 <HEN> 

SCross-reference EMBL ; X53825 

SUMMARY SNolecular-ueight 4485 DLength 45 UChecksun 9465 

SEQUENCE 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optinized Score = 15 Significance = 3.96 

36'/. Matches = 17 Misnatches = 28 

2 Conservative Substitutions = 0 


110 120 130 140 150 160 170 

PGQELTKBGCKTCSLGTFNDSMGTGVCRPWTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGG 

I III I 

APFPGN8NISASPNPSNATTRG 
X 10 20 


180 190 200 X 210 220 230 240 250 

PGGHSLQVLTLFLALTSALLLALIFITLLFSVLKUIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGG 
II III III I II I 
-GGSSLQSTAGLLAL-SLSLLHLYC 
30 40 X 


GGG 


6. ELLIS-012-FIG2AB.PEP (1-256) 

S01877 NADH dehydrogenase (ubiquinone) chain 5 - Brine 


ENTRY 

TITLE 


DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 


S01877 #Type Protein (fragnent) 

NADH dehydrogenase (ubiquinone) chain 5 - Brine 
shrinp piitochondrion (SGC4) (fragnent) #EC-nunber 
1.6. 5. 3 

31-Mar-1990 DSequence 31-Mar-1990 #Text 23-Mar-1993 
0.0 0.0 0.0 0.0 0.0 
nitochondrion Artenia sp. #Connon-nane brine shrinp 
S01877 


SAuthors Batuecas B.> Garesse R.r Calleja M.. Valverde J.R.. 
Marco R. 

idournal Nucleic Acids Res. (1988) 16:6515-6529 

#Title Genone organization of Artenia nitochondrial DNA. 

#Reference-r.unber S01207: MUID : 88289417 
^Accession S01877 

#Molecule-type DNA 
SResidues 1-59 <BAT> 

tCross-reference EMBL? X07663 
KEYWORDS nitochondrion\ oxidoreductase 

GENETIC 


#Special-code 4 

SUMMARY ^Length 59 SChecksun 9192 

SEQUENCE 


Initial Score 


10 Optinized Score 


14 Significance = 3.96 


Gaps 


2 Conservative Substitutions 


0 


110 120 130 140 150 160 170 180 

ELTKQGCKTCSLGTFND8NGTGVCRPWTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGG 


MGELLYHEGDCGWVEEAGPSLI 
X 10 20 

190 200 210 X 220 230 240 250 

H — SLQVLTLFLALTSALLLAL IFITLLFSVLKWIRKKFPHI FKQPFKKTTGAAQEEDACSCRCPQEEEGGG 


HHNSLRGSSLFSFLTSSPYKVL ILSSLLFTLFMYSMA 
30 40 50 X 


GGYEL 


7. ELLIS-012-FIG2AB.PEP (1-256) 

S15784 Heat-stable antigen - Mouse 


ENTRY 

TITLE 

DATE 

PLACEMENT 
SOURCE 
ACCESSION 
REFERENCE 
8 Authors 


S15784 #Type Protein 
Heat-stable antigen - House 

07-Apr-l 992 ISequence 07-Apr-1992 #Text 07-Apr-1992 
0.0 0.0 0.0 0.0 0.0 
Mus nusculus #Cofimon-nane house mouse 
S157S4 

Menger R.H.. Ayane fl.i Bose R.. Koehler G.. Nielsen 
P.J. 


#Journal Eur. J. Immunol. (1991) 21:1039-1046 

ITitle The genes for a nouse henatopoietic differentiation 
marker called the heat-stable antigen. 
SReference-number S15783; MUID:91209380 
SAccession S15784 

SStatus prelininary 

^Residues 1-76 <WEN> 

SCross-reference EtIBL : X56469 

SUMMARY #Molecular-ueight 7797 SLength 76 ((Checksum 2479 

SEQUENCE 


Initial Score 
Residue Identity 
Gaps 


10 Optimized Score = 19 Significance = 3.96 

28% Hatches = 22 Mismatches = 52 

4 Conservative Substitutions = 0 


80 90 100 110 120 130 140 

SSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTGVCRPNTNCSLDGRSVLKT 


MGRAMVARLGLGLLLLALLLPT 
X 10 20 

150 160 170 180 190 200 X 210 

— GTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSL8VLTLFLALTSALLLALIFITLLFSVLKMIRKKFP 

II III I II III III I II I 

QIYCNQTSVAPFPGNQNISASPNPSNATTRG-GGSSL8STAGLLAL-SLSLLHLYC 
30 40 50 60 70 X 


220 230 240 250 

HIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGG 


8. ELLIS-012-FIG2AB.PEP (1-256) 

A43537 Heat stable antigen Ml/69-Jlld precursor - Mouse 


ENTRY 


A43537 #Type Protein 


DATE 06-Nov-1992 SSequence 06-Nov-1992 SText 23-Mar-1993 

PLACEMENT 0.0 0.0 0.0 0.0 0.0 

SOURCE Mus musculus SCommon-name house nouse 

ACCESSION A43537 

REFERENCE 


SAuthors Kay R.i Tskei F.r Hunphries R.K. 

SJournal J. Innunol. (1990) 145.' 1952-1959 

STitle Expression cloning of a cDNA encoding Ml/69-Jlld 

heat-stable antigens. 

SReference-number A43537; hUID : 90361906 
SAccession A43537 

•Status prelininary 

SMolecule-type nRNA 
•Residues 1-76 <KAY> 

•Cross-reference GB i M5866 1 

•Comment This sequence has not been compared to the 
nucleotide translation. 

SUMMARY SMolecular-ueight 7797 ^Length 76 SChecksun 2479 

SEQUENCE 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optimized Score = 19 Significance = 3.96 

287. Matches = 22 Mismatches = 52 

4 Conservative Substitutions = 0 


80 90 100 110 120 130 140 

SSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTGVCRPWTNCSLDGRSVLKT 

I III 

MGRAMVARLGLGLLLLALLLPT 
X 10 20 


150 160 170 180 190 200 X 210 

— GTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLALIFITLLFSVLKUIRKKFP 

II III I II III III I II I 
QIYCNQTSVAPFPGNQNISASPNPSNATTRG-GGSSLQSTAGLLAL-SLSLLHLYC 
30 40 50 60 70 X 


220 230 240 250 

HIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGG 


9. ELLIS-012-FIG2AB.PEP (1-256) 

S21969 19K zein precursor (clone ZG31A) - Maize (fragment 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

SAuthors 


{(Journal 

•Title 


S21969 SType Protein (fragment) 

19K zein precursor (clone ZG31A) - Maize (fragment) 
04-Dec-1992 SSequence 04-Dec-!992 SText 23-Mar-1993 
0.0 0.0 0.0 0.0 0.0 
Zea mays SCommon-name maize 
S21969 

Hu N.T.r Peifer M.A.r Heidecker G.» Messing J.i 
Rubenstein I. 

EMBO J. (1982) 1U337-1342 
Primary structure of a genomic zein sequence of 
maize. 


SReference-number S07172; MUID: 84207882 


SAccession S21969 


SMolecule-type nRNA 
SResidues 1-214 <HUN> 


SCross-reference EHBL : V01473 

SConment The translation of the nucleotide sequence is not 

given in this paper. 

SUPERFAMILY SName zein 

KEYWORDS seed\ storage protein 

SUMMARY 

ccfiiicwrc 


SLength 214 SChecksum 4377 



Initial Score = 10 Optimized Score = 19 Significance = 3.96 

Residue Identity = 217. Hatches = 27 Mismatches = 83 

Gaps = 16 Conservative Substitutions = 0 

90 100 110 120 130 X 140 150 

AECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTGVCRPWTNCSLDGRSVLKTGTTEK 

II I 

ATIFPSCSQAPI ASLLPPYLSP 
X 10 20 


160 170 180 190 200 210 220 

DV — VCGPPV VSFSPSTT I SVTPEG GPGGHSL SVLTLFLALTSALLLAL1FITLLF S VL KUIRKKFPHI FK8 

I II I I I I III till I II II I 

AVSSVCENP — ILQPYRI QQAITAG I LPLSPLFLQQSSALLHQLPLVHLL — AQNIR AQ9LQ 

30 40 50 60 70 80 


230 240 250 X 

PFKKTTGAAQEEDACSCRCP8EEEGGGGGYEL 

II i I 

QLVLANLAAYSQQQQFLPFNQLAALNSASYLQQQQLPFSQLPAAYPQQFLPFNQLAALNSPAYLQQQQLLPF 
90 100 110 X 120 130 140 150 

SQL AGVSPAT 
160 


10. ELLIS-012-FIG2AB.PEP (1-256) 

ZIZMA2 19K zein precursor (clone cZ 19A2) - Maize (fragmen 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 


ZIZMA2 SType Protein (fragment) 

19K zein precursor (clone cZ19A2) - Maize (fragment) 
30-Jun-1988 #Sequence 30-Jun-1988 SText 31-Mar-1993 
1340.0 1.0 4.0 3.0 1.0 

Zea mays #Common-name maize 
D24557 

(Inbred line W64A) 


^Authors Marks M.D.i Lindell J.S.> Larkins B.A. 

((Journal J. Biol. Chem. (1985) 260:16451-16459 
((Title Nucleotide sequence analysis of zein mRNAs from 

maize endosperm. 

((Reference-number A92510; MUID:36059563 


SAccession D24557 


((Molecule-type mRNA 
((Residues 1-230 <HAR> 

((Comment The authors translated the codon GAC for residue 209 

as Asn. 

SUPERFAMILY ((Name zein 


KEYWORDS 

FEATURE 

seed\ storage protein 



1-18 

((Domain signal sequence 

(fragment) 

<5IG>\ 

19-230 

IProtein 19K zein <MAT> 



SUMMARY 

SEQUENCE 

({Length 230 

((Checksum 

8546 


Initial 

Score = 

10 

Optimized Score = 23 

Significance = 

3.96 

Residue 

Identity = 

227. 

Matches = 33 

Mismatches = 

94 

Gaps 

= 

19 

Conservative Substitutions 

= 

0 


70 80 90 100 110 X 120 130 

ICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDSNGTGVCR 


K I FCFLMLLG-LSASA AT AT I F 
X 10 20 


PHTNCSLDGRSVLKTGTTEKDV--VCGPPVVSFSPSTTISVTPEGGPGGHSL8VLTLFLALTSALLLALIFI 

i li I I II I I I I III Mil I 

P — QCS8 AP I TSLLPP YLSP A VSS VCENP — IL8FYR18QAIAAG ILPLSPLFLQ8PSALLQQLPLV 

30 40 50 60 70 80 

210 220 230 240 250 X 

TLLFSVLKWIRKKFPHIFK8PFKKTTGAA8EEDACSCRCP8EEEGGGGGYEL 

II II I II II 

HLL — AQNIR A00L08LVLGNLAAYS80H8FLPFNGLAALNSAAYL88GLPFS8LAAAYP8QFLPFN 

90 100 110 120 130 140 

QLAALNSAAYLQ880LPPFS8LADVSPAAF 
150 160 170 


11. ELLIS-012-FIG2AB.PEP (1-256) 

S03417 19K zein precursor (clone gZ19ABll) - Maize 


ENTRY 

TITLE 

ALTERNATE-NAME 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 


S03417 #Type Protein 

19K zein precursor (clone gZ 19AB1 1 ) - Maize 

zein alpha 

07-Sep-1990 SSequence 07-Sep-1990 ftText 23-Mar-1993 
0.0 0.0 0.0 0.0 0.0 
Zea nays ^Common-name maize 
S03417 


#Authors Kriz A.L.i Boston R.S., Larkins B.A. 

(Journal Mol. Gen. Genet. (1987) 207:90-98 

STitle Structural and transcriptional analysis of DNA 

sequences flanking genes that encode 19 kilodalton 
zeins. 

OReference-nunber S03417: MUID : 87257300 


#Accession S03417 


#Molecule-type DNA 
ftResidues 1-234 <KRI> 


SCross-reference EMBL:X059il 


#Comment 

SUPERFAMILY 

KEYWORDS 

FEATURE 

1-21 

22-234 

SUMMARY 

SE0UENCE 


The translation of the nucleotide sequence is not 
given in this paper. 

INane zein 

seed\ storage protein 

ttDomain signal sequence <SIG>\ 

IProtein 19K zein <MAT> 

DHolecular-ueight £5439 OLength 234 (^Checksum 3229 


Initial Score = 

10 

Optimized Score = 

23 Significance = 3.96 

Residue Identity = 

21X 

Matches = 

32 Mismatches = 98 

Gaps = 

19 

Conservative Substitutions 

= 0 

60 70 

80 

90 100 

110 

120 130 


NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGP8CTRCEKDCRPG8ELTKQGCKTCSLGTFND8NGTG 

I I II 

MAAKIFCLLMLLG — LSASAA 
X 10 

140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVLKTGTTEKDV— VCGPPVVSFSPSTTISVTPEGGPGGHSL8VLTLFLALTSALLLAL 

III I I II I I I I III HU I 

TATIFTQCS8APIASLLPPYLSSAVSSVCENP— IL6PYRI8QAIAAG I LPLSPLFLQQSSALLQ8L 

20 30 40 50 60 70 80 


210 220 230 240 250 X 

IFITLLFSVLKUIRKKFPHIFKGPFKKTTGAA3EEDAC5CRCPQEEEGGGGGYEL 

II II I II II 

pi uui i aomip a am pm ui ami AAVcnnaaci pcmsii rci mcacvi ocwsoi dccui PAAVD&nr 


90 100 110 120 130 X 140 

LPFNQLAALNSPAYLQQQQLLPFSQLAGVSPAT 
150 160 170 180 


12. ELLIS-012-FIG2AB.PEP (1-256) 

S21970 19K zein precursor (clone A30) - Maize 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

^Authors 


#Journal 

#Title 


S21970 #Type Protein 
19K zein precursor (clone A30) - Maize 
04-Dec-l 992 fSequence 04-Dec-1992 flText 23-Mar-1993 
0.0 0.0 0.0 0.0 0.0 
Zea nays IConnon-nane naize 
S21970 

Hu N.T.i Peifer M.A.. Heidecker G.> Messing J.. 
Rubenstein I. 

EMBO U. (1982) 1:1337-1342 
Prinary structure of a genonic zein sequence of 
naize. 


SReference-nunber S07172: HUID:84207882 
^Accession S21970 


#Molecule-type nRNA 
#Residues 1-234 <HUN> 


SCross-reference EMBL : V01 481 

OConnent The translation of the nucleotide sequence is not 

given in this paper. 

SUPERFAMILY INane zein 

KEYWORDS seed\ storage protein 

FEATURE 


1-21 iDonain signal sequence <SIG>\ 

22-234 OProtein 19K zein <MAT> 

SUMMARY DMolecular-ueight 25403 SLength 234 SChecksun 977 
SEQUENCE 


Initial Score = 

10 

Optimized Score = 

23 Significance = 3.96 

Residue Identity = 

227. 

Matches = 

33 Misnatches = 97 

Gaps = 

19 

Conservative Substitutions 

= 0 

60 70 

80 

90 100 

110 

120 130 


NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 

I I II I 

NAAKIFCLLMLLG-LSASAATA 
X 10 20 

140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVLKTGTTEKDV — VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

I II I I II I I I I III INI I 

T IFP — QCSQAPI ASLLPPYLSPAVSSVCENP — ILQPYRIQQAI AAG ILPLSPLFLQ9SSALLS9L 

30 40 50 60 70 80 

210 220 230 240 250 X 

IFITLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I II II 

PLVHLL — A9NIR A8QLQ9LVLANLAAYS9QQQFLPFN8LAALNSASYL9QQ8LPFSQLPAAYP89F 

90 100 110 120 130 X 140 

LPFNQLAALNSPAYLQQQQLLPFSQLAGVSPAT 
150 160 170 180 


13. ELLIS-012-FIG2AB.PEP (1-256) 

ZIZMB1 19K zein precursor (clone cZ19Bl) - Maize 


ENTRY 


7T7MR1 


ft Tima Pr nf a i ri 



TITLE I9K zein precursor (clone cZ19Bl) - Maize 

DATE 30-Jun-!988 ((Sequence 30-Jun-1988 ((Text 31-I1ar-1993 

PLACEMENT 1340.0 1.0 4.0 2.0 2.0 

SOURCE Zea nays ((Common-name maize 

ACCESSION E24557 

REFERENCE (Inbred line W64A) 

((Authors Marks M.D.r Lindell J.S.> Larkins B.A. 

Journal J. Biol. Chen. (1985) 260:16451-16459 
STitle Nucleotide sequence analysis of zein mRNAs from 

naize endosperm. 

(tReference-nunber A92510J MUID : 86059563 
iAccession E24557 

((Molecule-type nRNA 
((Residues 1-234 <MAR> 

SUPERFAMILY #Name zein 

KEYWORDS 5eed\ storage protein 

FEATURE 

1-21 ((Domain signal sequence <SIG>\ 

22-234 ((Protein 19K zein <MAT> 

SUMMARY IMolecular-ueight 25435 ((Length 234 ((Checksum 3129 
SEQUENCE 

Initial Score = 10 Optimized Score = 23 Significance = 3.96 

Residue Identity = 22'/. Matches = 33 Mismatches = 97 

Gaps = 19 Conservative Substitutions = 0 

60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 

1 I II I 

MAAKIFCLLMLLG-LSASAATA 
X 10 20 

140 150 160 170 180 190 200 

VCRPHTNCSLDGRSVLKTGTTEKDV— VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

I II I I II I I I I III llli I 

TIFP— QCSQAPIASLLPPYLSSAVSSVCENP — ILQPYRIQ8AIAAG ILPLSPLFLQQSSALLQQL 

30 40 50 60 70 80 

210 220 230 240 250 X 

IFITLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I II II 

PLVHLL — AQNIR AQQLQQLVLANLAAYSQQQQFLPFNQLGSLNSASYLQQQSLPFSSLPAAYPSSF 

90 100 110 120 130 X 140 

LPFNQLAALNSPAYLQQQQLLPFSQLAGVSPAT 
150 160 170 180 


14. ELLIS-012-FIG2AB.PEP (1-256) 

ZIZM3 19K zein precursor (clone A30) - Maize 

ENTRY ZIZM3 #Type Protein 

TITLE 19K zein precursor (clone A30) - Maize 

DATE 18-Dec-198i #Sequence 3Q-Jun-1988 ((Text 31-Mar-1993 

PLACEMENT 1340.0 1.0 4.0 2.0 1.0 

SOURCE Zea mays #Common-name naize 

ACCESSION C22762\ A03349 

REFERENCE (Clone A30» sequence translated from the nRNA 

sequence) 

((Authors Geraghty D.i Peifer M.A.. Rubenstein I. r Messing J. 
ftJournal Nucleic Acids Res. (1981) 9:5163-5174 

((Title The primary structure of a plant protein: zein. 

((Reference-number A93741 ; MUID : 82081837 
REFERENCE (Revision to amino end! 

§Authors Geraghty D.E.> Messing J.> Rubenstein I. 

((Journal FMRfl J. 1:!W-1T!S 



ITitle 


Sequence analysis and comparison of cDNAs of the 
zein multigene family. 

IReference-nunber A90967; HUID: 84207881 
SUPERFAMILY IName zein 

KEYWORDS seed\ storage protein 

FEATURE 

1-21 IDomain signal sequence <SIG>\ 

22-234 IProtein 19K zein <MAT> 

SUMMARY IMolecular-ueight 25403 ILength 234 IChecksum 977 
SEQUENCE 

Initial Score = 10 Optimized Score = 23 Significance = 3.96 

Residue Identity = 22X Hatches = 33 Mismatches = 97 

Gaps = 19 Conservative Substitutions = 0 

60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPSCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 

I I II I 

MAAKIFCLLHLLG-LSASAATA 
X 10 20 

140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVLKTGTTEKDV--VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

i it i i it i i i i iii mi i 

TI FP — QCSQ AP I ASLLPP YLSPAVSSVCENP-- I LQPYR I QQA I AAG ILPLSPLFLQQSSALLQQL 

30 40 50 60 70 80 

210 220 230 240 250 X 

IFITLLFSVLKWIRKKFPHIFKQPFKKTTGAA9EEDACSCRCPGEEEGGGGGYEL 

II II 1 II II 

PLVHLL — AQN1R AQSLGQLVLANLAAYS9QQGFLPFNQLAALNSASYLQQQ9LPFSQLPAAYP8QF 

90 100 110 120 130 X 140 

LPFNQLAALNSPAYLQQQQLLPFSQLAGVSPAT 
150 160 170 180 


15. ELLIS-012-FIG2AB.PEP (1-256) 

S15655 Zein, 19K - Maize 

ENTRY S15655 IType Protein 

TITLE Zein, 19K - Maize 

DATE 04-Apr-l 992 ISequence 04-Apr-1992 IText 04-Apr-1992 

PLACEMENT 0.0 0.0 0.0 0.0 0.0 

SOURCE Zea nays IConmon-name maize 

ACCESSION S15655 

REFERENCE 

lAuthors Quayle T.J.A., Broun J.W.S., Feix G. 

^Journal Gene (1989) 80:249-257 

ITitle Analysis of distal flanking regions of maize 19-kDa 
zein genes. 

IReference-nunber S15655; MUID:90Q60774 
lAccession S15655 

IStatus preliminary 

IResidues 1-235 <QUA> 

ICross-reference EMBL : X53582 

SUMMARY IMolecular-ueight 25505 ILength 235 IChecksum 1651 
SEQUENCE 

Initial Score = 10 Optimized Score = 23 Significance = 3.96 

Residue Identity = 22X Matches = 33 Mismatches = 97 

Gaps = 19 Conservative Substitutions = 0 

60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPSCTRCEKDCRPG8ELTK9GCKTCSLGTFND9NGTG 

1 1 


i i 


I 



NAAKIFCLLKLLG-LSASAATA 
X 10 20 


140 150 160 170 180 190 200 

VCRPMTNCSLDGRSVLKTGTTEKDV--VCGPPVVSFBPSTTISVTPEGGPGGHSL8VLTLFLALTSALLLAL 

I II I I II I I I I III till I 

TIFP — QCSQAPIASLLPPYLSPAVSSVCENP — ILQPYRIQSAIAAG I LPLSPLFLQQSSALLQQL 

30 40 50 60 70 80 


210 220 230 240 250 X 

IFITLLFSVLKHIRKKFPHIFKQPFKKTTGAAGEEDACSCRCPQEEEGGGGGYEL 


PLVHLL — AQNIR AQQLQQLVLANVAAYSQQQ0FLPFN9LAALNSAAYLQQQQLLPFSQLTAAYP8Q 

90 100 110 120 130 X 140 

FLPFN8LAALNSAAYL8QQ8LLPFS8LAVVSPA 
150 160 170 180 



> 0 < 

0| |0 InteUiGenetic5 

> 0 < 

FastDB - Fast Pairwise Comparison of Sequences 
Release 5.4 

Results file ellis-012-fig2ab-spt.res nade by shears on Tue 14 Sep 93 15 ! 06 i 00-PDT . 


Query sequence being compared:ELLIS-012-FIG2AB.PEP (1-256) 

Number of sequences searched: 29955 

Number of scores above cutoff: 3792 

Results of the initial comparison of ELLIS-01 2-F IG2AB .PEP (1-256) with: 
Data bank : Suiss-Prot 25. all entries 

100000 - 

N 

U50000- 

II 

B 

E -* 

R 

0 

F 1 0000— 

» 

s 

E 5000- 

Q 

U 

E 

N 

C 

E 

S 1000- 


500- 


- e 


100 - 


50 - 


10 - 


5- 


0 


SCORE 0| || ||28 57 85 114 142 171 199 228 256 

STDEV 9 


PARAMETERS 


Similarity matrix 

Unitary 

K- tuple 

2 

Mismatch penalty 

5 

Joining penalty 

30 

Gap penalty 

1.00 

Window size 

32 

Gap size penalty 

0.26 



Cutoff score 

0 



Randomization group 

0 



Initial scores to save 40 

Alignments to save 

15 

Optimized scores to 

save 0 

Display context 

50 


SEARCH STATISTICS 


Scores: 

Mean 

Median Standard 

Deviation 


4 

5 1.75 


Times: 

CPU 

Total Elapsed 


00:01:57.02 

00:03:57, 

,00 

Number of residues: 


10214020 


Number of sequences 

searched: 

29955 


Number of scores above cutoff: 

3792 



Cut-off raised to 4. 

Cut-off raised to 5. 

Cut-off raised to 6. 

Cut-off raised to 7. 

The scores below are sorted by initial score. 

Significance is calculated based on initial score. 

A 100X identical sequence to the query sequence was found: 

Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 


1. 41BBJ10USE T CELL ANTIGEN 4-1BB PRECURSO 256 256 256143.96 0 

The list of other best scores is: 


Sequence Name Description 


Init. Opt. 

Length Score Score Sig. Frame 


3 standard deviations above mean 


2. 

TF.H0USE 

TISSUE FACTOR PRECURSOR (TF) 

294 

11 

42 

4.00 

3. 

NU5M. 

.ARTS* 

NADH-UBISUINQNE OX ID0REDUCTAS 

59 

10 

14 

3.43 

4. 

M169. 

.MOUSE 

M1/69-J11D HEAT STABLE ANTIGE 

76 

10 

19 

3.43 

5. 

ZEAB. 

.MAIZE 

ZEIN- ALPHA PRECURSOR (19 KD) 

186 

10 

21 

3.43 

A. 

7F41 

MAT 7F 

7FTM-AI F'MA POFrilPFriB ( f Q Vm 

PTA 

in 


1 A*Z 


7. 

ZEA5J1AIZE 

ZE3N-ALPHA PRECURSOR (19 KD) 

234 

10 

23 

3.43 

0 

8 . 

ZEA4J1AIZE 

ZEIN-ALPHA PRECURSOR (19 KD) 

234 

10 

23 

3.43 

0 

9. 

ZEA1J1AIZE 

ZEIN-ALPHA PRECURSOR (19 KD) 

234 

10 

23 

3.43 

0 

10. 

ZEAC MAIZE 

ZEIN-ALPHA PRECURSOR (19 KD) 

235 

10 

23 

3.43 

0 

11. 

ZEA2.MAIZE 

ZEIN-ALPHA PRECURSOR (19 KD) 

235 

10 

22 

3.43 

0 

12. 

ZEAL.MAIZE 

ZEIN-ALPHA PRECURSOR (CLONE Z 

253 

10 

21 

3.43 

0 

13. 

H034_M0USE 

M0V34 PROTEIN. 

321 

10 

38 

3.43 

0 

14. 

ATPB SULAC 

MEMBRANE-ASSOCIATED ATPASE BE 

465 

10 

39 

3.43 

0 

15. 

TENA_CHICK 

TENASCIN PRECURSOR (TN! (HEXA 1808 

2 standard deviations above mean 

10 

*#** 

36 

3.43 

0 

16. 

KR2.HUMAN 

HKR2 PROTEIN (FRAGMENT) . 

106 

9 

23 

2.86 

0 

17. 

WAP_RABIT 

WHEY ACIDIC PROTEIN PRECURSOR 

127 

9 

16 

2.86 

0 

18. 

NU6MJR0YA 

NABH-UBIQUINONE OXIDOREDUCTAS 

174 

9 

15 

2.86 

0 

19. 

NU6M_DR0NE 

NADH-UB3QUIN0NE OXIDOREDUCTAS 

174 

9 

16 

2.86 

0 

20. 

YEIB_ECOLI 

HYPOTHETICAL PROTEIN IN GALS 

196 

9 

28 

2.86 

0 

21. 

HBG3_M0USE 

INT-2 PROTO-ONCOGENE PROTEIN 

245 

9 

34 

2.86 

0 

22. 

NIFC_CLOPA 

NIFC PROTEIN. 

286 

9 

16 

2.86 

0 

23. 

YCE9_YEAST 

HYPOTHETICAL 35.6 KD PROTEIN 

312 

9 

16 

2.86 

0 

24. 

ASG2.EC0LI 

L-ASPARAGINASE II PRECURSOR < 

348 

9 

39 

2.86 

0 

25. 

HC_HUMAN 

ALPHA-1 -MICROGLOBULIN I INTER 

352 

9 

34 

2.86 

0 

26. 

DBDR_RAT 

D(1B> DOPAMINE RECEPTOR. 

475 

9 

18 

2.86 

0 

27. 

D5DR.HUMAN 

D(5) DDPAMINE RECEPTOR. 

477 

9 

19 

2.86 

0 

28. 

LMP2.EBV 

GENE TERMINAL PROTEIN (MEMBRA 

497 

9 

37 

2.86 

0 

29. 

HPP1_NEUCR 

MITOCHONDRIAL PROCESSING PEPT 

577 

9 

21 

2.86 

0 

30. 

HS75_YEAST 

HEAT SHOCK PROTEIN SSB1. 

613 

9 

35 

2.86 

0 

31. 

EF3 PNECA 

ELONGATION FACTOR 3 (EF-3). 

1042 

9 

40 

2.86 

0 

32. 

NRG_DROME 

NEUROGLIAN PRECURSOR. 

1239 

9 

37 

2.86 

0 

33. 

IP3R_DR0ME 

INOSITOL 1 , 4 r 5-TR I SPHOSPHATE— 

2833 

9 

35 

2.86 

0 

34. 

DEF1_RABIT 

C0RTICOSTATIN I PRECURSOR (CS 

93 

8 

18 

2.29 

0 

35. 

CYB.GEOSD 

CYTOCHROME B (EC 1.10.2.2) (F 

96 

8 

10 

2.29 

0 

36. 

APC2_CAVP0 

APQLIPOPROTEIN C-II PRECURSOR 

100 

8 

15 

2.29 

0 

37. 

VPX_HIV2D 

VPX PROTEIN (X ORF PROTEIN). 

111 

8 

19 

2.29 

0 

38. 

VPX_SIVS4 

VPX PROTEIN (X ORF PROTEIN). 

112 

8 

19 

2.29 

0 

39. 

COL_CANFA 

COLIPASE PRECURSOR. 

112 

8 

19 

2.29 

0 

40. 

YSCBJEREN 

HYPOTHETICAL YSC OPERON PROTE 

137 

8 

21 

2.29 

0 


1. ELLIS-012-FIG2AB.PEP (1-256) 

41BB_M0USE T CELL ANTIGEN 4-1BB PRECURSOR. 

ID 41BBJIOUSE STANDARD; PRT ; 256 AA. 

AC P20334; 

DT 01-FEB-1991 (REL. 17. CREATED) 

DT 01-FEB-1991 (REL. 17, LAST SEQUENCE UPDATE) 

DT 01 -APR-1 993 (REL. 25, LAST ANNOTATION UPDATE) 

DE T CELL ANTIGEN 4-1BB PRECURSOR. 

OS HUS MU5CULUS (MOUSE) . 

OC EUKARYOTA! METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA; 

oc eutheria; rodentia. 

RN C 1 3 

RP SEQUENCE FROM N. A. 

RM 89184547 

RA KH0N B.S., WEISSMAN S.M.J 

RL PROC. NATL. ACAD. SCI. U.S.A. 86:1963-1967(1989). 

RN [23 

RP CHARACTERIZATION, AND SEQUENCE OF 25-29. 

RA POLLQK K.E. , KIM Y.-J., ZHOU Z., HURTADO J., KIN K.K., PICKARD R.T., 

RA KWON B.S.; 

RL J. IMMUNOL. 150? 771 -731 ( 1 993) . 

CC -!- FUNCTION: PUTATIVE RECEPTOR FOR A CYTOKINE. POSSIBLY ACTIVE 
CC DURING T CELL ACTIVATION. 

CC -!- SUBUNIT: PRINCIPALLY AN HOMODIMER, BUT ALSO FOUND AS A MONOMER. 

CC -!- INDUCTION: OPTIMAL BY PMA AND IONQMYCIN. 

CC -i- TISSUE SPECIFICITY: EXPRESSED ON THE SURFACE OF ACTIVATED T CELLS. 
CC -!- SIMILARITY: CONTAINS A LA-NGFR/TNFR-TYPE CYSTEINE-RICH REGION. 

DR FMSLT ,10449?: 


DR PIR; B32393; B32393. 

DR PROSITE; PS00652; TNFR NGFR. 

KM RECEPTOR; GLYCOPROTEIN; SIGNAL. 


FT 

SIGNAL 

1 

24 

FT 

CHAIN 

25 

256 

FT 

DOMAIN 

17 

159 

FT 

REPEAT 

17 

45 

FT 

REPEAT 

46 

85 

FT 

REPEAT 

86 

117 

FT 

REPEAT 

118 

159 

FT 

CARBOHYD 

128 

128 

FT 

CARBOHYD 

138 

138 

SQ 

SEQUENCE 

256 AA! 27! 


T CELL ANTIGEN 4-1BB. 
NGFR/TNFR REPEATS. 
NGFR/TNFR REPEAT 1. 
NGFR/TNFR REPEAT 2. 
NGFR/TNFR REPEAT 3. 
NGFR/TNFR REPEAT 4. 
POTENTIAL. 

POTENTIAL. 

MW; 347415 CN; 


Initial Score = 
Residue Identity = 
Gaps = 


256 Optimized Score = 256 Significance = 143.96 

1007. Matches = 256 Hi snatches = 0 

0 Conservative Substitutions = 0 


X 10 20 30 40 50 60 70 

MGNNCYNVVVIVLLLVGCEKVGAVQNSCDNCQPGTFCRKYNPVCKSCPPSTFSSIGGQPNCNICRVCAGYFR 


i i ■ f I i < I i > i i t I i i i ) ; i ! < I ! I i I { < i I i t ! M i i 1 l ; ! t I I l I ! I I il 1 I II i 1 I 1 I I ! 1 I I f M I I I 

MGNNCYNVVVIVLLLVGCEKVGAV0NSCDNC8PGTFCRKYNPVCKSCPPSTFSSIGGQPNCNICRVCAGYFR 
X 10 20 30 40 50 60 70 


80 90 100 110 120 130 140 

FKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKGGCKTCSLGTFNDQNGTGVCRPWTNCSLDGR 

Mi!iini!ii mu mu inn i urn iiiimiiii;iiiii!iimiiiiimtiii!iii 

FKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTGVCRPWTNCSLDGR 
80 90 100 110 120 130 140 

150 160 170 180 190 200 210 

SVLKTGTTEKDVVCGPPWSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLALIFITLLFSVLKWIRK 

i f 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) 1 1 1 1 1 1 1 1 

SVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLALIFITLLFSVLKWIRK 
150 160 170 180 190 200 210 

220 230 240 250 X 

KFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

1 1 1 1 1 ill I i I i i I ! i I i 1 1 1 HI I III ! HI 1 1 1 ! 1 1 1 1 
KFPHIFKQPFKKTTGAASEEDACSCRCPQEEEGGGGGYEL 
220 230 240 250 X 


2. ELLIS-012-FIG2AB.PEP (1-256) 

TF_MOUSE TISSUE FACTOR PRECURSOR (TF) (COAGULATION FACTOR I 

ID TF.MOUSE STANDARD; PRTi 294 AA. 

AC P20352; 

DT 01-FEB-1991 (REL. 17, CREATED) 

DT 01 -AUG- 1991 (REL. 19, LAST SEQUENCE UPDATE) 

DT 01 -HAY- 1992 (REL. 22, LAST ANNOTATION UPDATE) 

DE TISSUE FACTOR PRECURSOR (TF) (COAGULATION FACTOR III). 

GN CF-3. 

OS MUS HUSCULUS (MOUSE). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 

oc eutheria; rodentia. 

rn m 

RP SEQUENCE FROM N.A. 

RM 91093171 

RA RANGANATHAN G., BLATTI S.P., SUBRAMANIAM M., FASS D.N., MAIHLE N.J., 
RA GETZ M.J.; 

RL J. BIOL. CHEM. 266:496-501(1991). 

RN 12) 

RP SEQUENCE FROM N.A. 

RC STRA IN=BALB/C ; 

RM 89343974 


RA HARTZELL S., RYDER K., LANAHAM A., LAU L.F., NATHANS D. ; 

RL MOL. CELL. BIOL. 912567-2573(1989). 

CC -!- FUNCTION: INITIATES BLOOD COAGULATION BY FORMING A COMPLEX WITH 
CC CIRCULATING FACTOR VII OR VIIA. THE CTF: VI IA3 COMPLEX ACTIVATES 

CC FACTORS IX OR X BY SPECIFIC LIMITED PROTOLYSIS. TF PLAYS A ROLE IN 

CC NORMAL HEMOSTASIS BY INITIATING THE CELL-SURFACE ASSEMBLY AND 
CC PROPAGATION OF THE COAGULATION PROTEASE CASCADE. 

DR EMBL; M57896; MMTFA. 

DR EMBL) M26071 ; MMTF. 

DR PIR; A32318; A32318. 

DR PIR! A39046; A39046. 

DR PROSITE; PS00621 ; TISSUE_FACTQR. 

KW GLYCOPROTEIN; BLOOD COAGULATION; TRANSMEMBRANE; SIGNAL; LIPOPROTEIN. 


FT 

SIGNAL 

1 

28 


FT 

CHAIN 

29 

294 

TISSUE FACTOR. 

FT 

DOMAIN 

29 

251 

EXTRACELLULAR (POTENTIAL) . 

FT 

TRANSMEM 

252 

274 

POTENTIAL. 

FT 

DOMAIN 

275 

294 

CYTOPLASMIC (POTENTIAL). 

FT 

SITE 

245 

247 

WKS MOTIF. 

FT 

CARBOHYD 

37 

37 

POTENTIAL. 

FT 

CARBOHYD 

57 

57 

POTENTIAL. 

FT 

CARBOHYD 

169 

169 

- POTENTIAL. 

FT 

CARBOHYD 

200 

200 

POTENTIAL. 

FT 

DISULFID 

75 

83 

BY SIMILARITY. 

FT 

DISULFID 

218 

241 

BY SIMILARITY. 

FT 

LIPID 

275 

275 

PALMITATE (BY SIMILARITY). 

FT 

CONFLICT 

26 

26 

I -> T (IN REF. 2). 

SO 

SEQUENCE 

294 AA; 

32935 

NH; 468130 CN; 


Initial 

Score = 

11 

Optimized Score = 42 

Significance = 

4.00 

Residue 

Identity = 

23X 

Hatches = 58 

Mismatches = 

165 

Gaps 

r 

28 

Conservative Substitutions 

= 

0 


X 10 20 

MGNNCYNVVVIVLLLVGCEK-V 

III I 

VRPRLLAAL APTFLGCLLLQ V I AG AG I PEK AFNLTW I STDFKT I LEUGPKPTNYT YT V8I SDRSRNUKNKCF 
10 20 30 40 50 X 60 70 

30 40 50 60 70 80 90 

GAVSNSCDNCSPGTFCRKYNPVCK-SCPPSTFSSIGGQPNCNICRVCAGYFRFKKF-CSSTHNAECECIEGF 

II I I I I I II I I I 

STTDTECDLTDEIVKDVTWAYEAKVLSVPRRNSVHGDGDQLVIHGEEPPFTNAPKFLPYRDTNLGQPVIQQF 
80 90 100 110 120 130 140 


100 110 120 130 140 150 

HCLGPSCTRCEKDCRPGOELTKQGCKTCSLGTFNDQNG— TGVCRPMTNCSLDGRSVLKTGTTE— KDVVCG 

I II II I I I I I I II I II I 

EQDGRKLNVVVKD SLT-L VRKNGTFLTLRaVFGKDLG Y I I TYRKGSSTGKKTKI TNTNEFS I DVEEG 

150 160 170 180 190 200 210 

160 170 180 190 200 210 220 

PPVVSFSPSTTIS-VTPEGGPGGHSL8VLT LFLALT SALLLAL IF ITLLFSVLKWI RKKFPHI 

I I I II I I I II I III III || I || 

VSYCFFVQAMIFSRKTNQNSPG — S3TVCTEQWKSFLGETL 1 1 VGAVVLL AT IF I ILLSISLCKRRK — NR 
220 230 240 250 260 270 280 

230 X 240 250 

FKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

I I I 

AGQKGKNTPSRLA 
290 X 


3. ELLIS-012-FIG2AB.PEP (1-256) 

NH r ,M 4PTCY MfiJUMIRTflllTMi-RX nv rnr>P!:ni|rT4Ct ruiTW * tvr ! A 7 


ID NU5H.ARTSX STANDARD; PRT; 59 AA. 

AC P 19047 ; 

DT 01-NQV-1990 (REL. 16, CREATED) 

DT 01-N0V-1990 (REL. 16, LAST SEQUENCE UPDATE) 

DT 01 -NOV- 1 990 (REL. 16, LAST ANNOTATION UPDATE) 

DE NADH-UBI9UIN0NE OXIDOREDUCTASE CHAIN 5 (EC 1.6. 5. 3) (FRAGMENT). 
GN ND5. 

OS ARTEMIA SP. (BRINE SHRIMP). 

OG MITOCHONDRION. 

oc eukaryota; metazoa; arthropoda; Crustacea; branchiopoda. 

RN III 

RP SEQUENCE FROM N.A. 

RM 88289417 

RA BATUECAS B., GARESSE R., CALLEJA H., VALVERDE J.R., MARCO R.; 

RL NUCLEIC ACIDS RES. 16:6515-6529(1988). 

CC -!- CATALYTIC ACTIVITY: NADH + UBIQUINONE = NAD (+) + UBIQUINOL. 
DR EMBL! X07663; MIAS07. 

DR PIR; S01877; S01877. 

KW OXIDOREDUCTASE; NAD; UBIQUINONE; MITOCHONDRION. 

FT NON.TER 1 1 

SQ SEQUENCE 59 AA; 6585 MW; 22406 CN; 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optimized Score = 14 Significance = 3.43 

25X Matches = 15 Mismatches = 42 

2 Conservative Substitutions = 0 


110 120 130 140 150 160 170 180 

ELTKQGCKTCSLGTFNDQNGTGVCRPUTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGG 


MGELLYHEGDCGWVEEAGPSLI 
X 10 20 


190 200 210 X 220 230 240 250 

H — SLfiVLTLFLALTSALLLAL I F I TLLFSVLKUI RKKFPHI FKGPFKKTTGAASEEDACSCRCPQEEEGGG 
I I! II III II III 
HHNSLRGSSLFSFLTSSPYKVLILSSLLFTLFMYSMA 
30 40 50 X 


GGYEL 


4.ELLIS-012-FIG2AB.PEP (1-256) 

M169_M0USE H1/69-J11D HEAT STABLE ANTIGEN PRECURSOR. 

ID M169J0USE STANDARD; PRT; 76 AA. 

AC P24807; 

DT 01 -MAR-1992 (REL. 21, CREATED) 

DT 01-MAR-1992 (REL. 21, LAST SEQUENCE UPDATE) 

DT 01 -AUG-1992 (REL. 23, LAST ANNOTATION UPDATE) 

DE M1/69-J11D HEAT STABLE ANTIGEN PRECURSOR. 

GN HSA-A. 

OS MUS MUSCULUS (MOUSE) . 

OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA; 

OC EUTHERIA; RODENTIA. 

RN [13 

RP SEQUENCE FROM N.A. 

RM 90361906 

RA KAY R., TAKEI F., HUMPHRIES R.K.; 

RL J. IMMUNOL. 145:1952-1959(1990). 

RN [23 

RP SEQUENCE FROM N.A. 

RC STRA I N=CBA X C57BL/6; TISSUE=SPLEEN; 

RM 91209380 

PA UCMPCP P U . AVAMC hi . once p _ !/ncui cp n wtci ccm d i ? 



RL EUR. J. IMMUNOL. 21:1039-1046(1991). 

CC -!- FUNCTION: NAY HAVE A SPECIFIC ROLE TO PLAY IN EARLY THYMOCYTE 
CC DEVELOPMENT. 

CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR. 

CC -!- TISSUE SPECIFICITY: IN LYMPHOID, MYELOID, AND ERYTHROID CELLS. 

CC -!- SIMILARITY: TO HUMAN SIGNAL TRANSDUCER CD24. 

DR EMBL; M58661! MHH169J1. 

DR EMBL: X56469; MMHSAAG. 

DR PIR; S15784; S15784. 

DR PIR; A43537; A43537. 

KW ANTIGEN; SIGNAL; GPI-ANCHOR; GLYCOPROTEIN; MULTIGENE FAMILY; MEMBRANE. 


FT 

SIGNAL 

1 

26 

POTENTIAL. 

FT 

CHAIN 

27 

56 

M1/69-J11D ANTIGEN. 

FT 

PROPEP 

57 

76 

REMOVED IN MATURE FORM (POTENTIAL). 

FT 

LIPID 

56 

56 

GPI-ANCHOR (POTENTIAL). 

FT 

CARBOHYD 

27 

27 

POTENTIAL. 

FT 

CARBOHYD 

39 

39 

POTENTIAL. 

FT 

CARBOHYD 

48 

48 

POTENTIAL. 

SQ 

SEQUENCE 

76 AA; 

7797 MW; 

30445 CN; 


Initial Score = 10 Optimized Score = 19 Significance = 3.43 

Residue Identity = 28X Hatches = 22 Misnatches = 52 

Gaps = 4 Conservative Substitutions = 0 

80 90 100 110 120 130 140 

SSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTGVCRPWTNCSLDGRSVLKT 

I ill 

MGRAMVARLGLGLLLLALLLPT 
X 10 20 

150 160 170 180 190 200 X 210 

— GTTEKDVVCGPPVVSFSPSTTI SVTPEGGPGGHSLQVLTLFLALTSALLLAL IFITLLFSVLKWIRKKFP 

II III I II III III I II I 

QIYCNQTSVAPFPGNQNISASPNPSNATTRG-GGSSLQSTAGLLAL-SLSLLHLYC 
30 40 50 60 70 X 

220 230 240 250 

HIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGG 


5. ELLIS-012-FIG2AB.PEP (1-256) 

ZEAB_MAIZE ZE1N-ALPHA PRECURSOR (19 KD) (CLONE PZ19.1) (FRAGM 

ID ZEAB_MAIZE STANDARD; PRT; 186 AA. 

AC P04705; 

DT 13-AUG-1987 (REL. 05, CREATED) 

DT 13-AUG-1987 (REL. 05, LAST SEQUENCE UPDATE) 

DT 01-AUG-19S8 (REL. 08, LAST ANNOTATION UPDATE) 

DE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE PZ19.1) (FRAGMENT). 

OS ZEA KAYS (MAIZE). 

oc eukaryota; planta; EMBRYOPHYTA; ANGIOSPERMAE; monocdtyledoneae; 

oc cyperales; gramineae. 

RN m 

RP SEQUENCE FROM N.A. 

RM 83103094 

RA PEDERSEN K., DEVEREUX J., WILSON D.R., SHELDON E., LARKINS B.A.; 

RL CELL 29:1015-1026(1982) . 

CC -!- FUNCTION: ZEINS ARE MAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 707. OF THE TOTAL 

CC ZEIN FRACTION. THEY ARE ENCODED BY A LARGE MULTIGENE FAMILY. 

DR EMBL; V01471; ZMZE02. 

KW SEED STORAGE PROTEIN; TANDEM REPEAT; MULTIGENE FAMILY; SIGNAL. 

FT SIGNAL 1 21 

FT CHAIN 22 >186 ZEIN-ALPHA. 

FT NON_TER 186 186 

ccmiPMrc mt aa: wy- i^aon/i w. 


Initial Score = 10 Optimized Score = 21 Significance = 3.43 

Residue Identity = 197. Matches = 29 Mismatches = 105 

Gaps = 15 Conservative Substitutions = 0 

60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPSCTRCEKDCRPGGELTKQGCKTCSLGTFNDQNGTG 

III! I 

MAAKIFCLIMLLG-LSASAATA 
X 10 20 

140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVL— KTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

I II I II I I I I III III! I 

SIFP — 6CSQAPIASLLPPYLSPAMSSVCENP — ILLF'YRIQQAIAAG ILPLSPLFLQQSSALL8GL 

30 40 50 60 70 80 

210 220 230 240 250 X 

IF I TLLFSVLKW1 RKKFPHI FKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I I 

PLVHLL — AQNIRAQQLQQLVLANLAAYSS8QQLPLVHLLASNIRAQQLQQLVLANLAAYS9QQ0FLPFN8 
90 100 110 120 130 X 140 150 

QLAAAYPRQFLPFNQLAALNSHAYVQQQQLLPF 
160 170 180 


6. ELLIS-012-FIG2AB.PEP (1-256) 

ZEA3J1AIZE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE 19A2) (FRAGMEN 

ID ZEA3.MAIZE STANDARD; PRT; 230 AA. 

AC PQ6674; 

DT 01-JAN-1988 (REL. 06, CREATED) 

DT 01 -JAN-1983 (REL. 06, LAST SEQUENCE UPDATE) 

DT 01-NDV-1988 (REL. 09, LAST ANNOTATION UPDATE) 

DE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE 19A2) (FRAGMENT). 

OS ZEA MAYS (MAIZE). 

oc eukaryota; planta; embryophyta; angiospermae; monocotyledoneae; 

oc cyperales; gramineae. 

rn m 

RP SEQUENCE FROM N.A. 

RM 86059563 

RA MARKS M.D. i LINDELL J.S., LARKINS B.A.; 

RL J. BIOL. CHEM. 260:16451-16459(1985). 

CC -!- FUNCTION: ZEINS ARE MAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 707. OF THE TOTAL 
CC ZEIN FRACTION. THEY ARE ENCODED BY A LARGE MULTIGENE FAMILY. 

CC -!- STRUCTURALLY, 22K AND 19K ZEINS ARE COMPOSED OF NINE ADJACENT, 

CC TOPOLOGICALLY ANTIPARALLEL HELICES CLUSTERED WITHIN A DISTORTED 

CC CYLINDER. 

DR EMBL; H12142; ZMZE19A2. 

DR P1R; D24557,’ ZIZMA2. 

KU SEED STORAGE PROTEIN; TANDEM REPEAT; MULTIGENE FAMILY; SIGNAL. 

FT NON_TER 1 1 

FT SIGNAL <1 18 

FT CHAIN 19 230 ZEIN-ALPHA. 

SQ SEQUENCE 230 AA; 25032 MW; 249816 CN; 

Initial Score = 10 Optimized Score = 23 Significance = 3.43 

Residue Identity = 227. Matches = 33 Mismatches = 94 

Gaps = 19 Conservative Substitutions = 0 

70 80 90 100 110 X 120 130 

ICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGGELTKQGCKTCSLGTFNDGNGTGVCR 


tfTCrri Ml | CACAATATfC 



X 


10 


20 


140 150 160 170 180 190 200 

PUTNCSLDGRSVLKTGTTEKOV — VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLALIFI 

i ii i i ii i i i i iii mi i 

P — 8CSG AP I TSLLPPYLSPA VSS VCENP — I LQP YR I BG A I AAG ILPLSPLFLQQPSALLQ9LPLV 

30 40 50 60 70 80 

210 220 230 240 250 X 

TLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I II II 

HLL — AQNIR AQQLQQLVLGNLAAYSQQH9FLPFNGLAALNSAAYLG99LPFS9LAAAYPQ9FLPFN 

90 100 110 120 130 140 

QLAALNSAAYLQQQQLPPFSQLADVSPAAF 
150 160 170 


7. ELLIS-012-FIG2AB.PEP (1-256) 

ZEA5_MAIZE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE GZ19AB11). 

ID ZEA5_MAIZE STANDARD; PRT; 234 AA. 

AC P08416; 

DT 01 -AUG-1 983 (REL. 08, CREATED) 

DT 01-AUG-1983 (REL. 08, LAST SEQUENCE UPDATE) 

DT 01-AUG-1988 (REL. 08, LAST ANNOTATION UPDATE) 

DE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE GZ19AB11). 

OS ZEA NAYS (MAIZE). 

OC EUKARYQTA; PLANTA; ENBRYOPHYTA; ANGIOSPERHAE! MONOCOTYLEDONEAE; 

OC CYPERALES! GRAMINEAE. 

RN [13 

RP SEQUENCE FROM N.A. 

RC STRAIN=W64A; 

RM 87257300 

RA KRIZ A.L., BOSTON R.S., LARKINS B.A.; 

RL MOL. GEN. GENET. 207 ; 90-98 ( 1937) . 

CC -!- FUNCTION: ZEINS ARE MAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 707. OF THE TOTAL 

CC ZEIN FRACTION. THEY ARE ENCODED BY A LARGE MULTIGENE FAMILY. 

DR EMBL! X05911; ZMZEI19. 

DR PIR,' S03417,' S03417. 

KW SEED STORAGE PROTEIN; TANDEM REPEAT; SIGNAL. 

FT SIGNAL 1 21 

FT CHAIN 22 234 ZEIN-ALPHA. 

SQ SEQUENCE 234 AA; 25439 MW; 271676 CNf 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optimized Score = 23 Significance = 3.43 

21% Matches = 32 Mismatches = 98 

19 Conservative Substitutions = 0 


60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 


HAAKIFCLLMLLG — LSASAA 
X 10 


140 150 160 170 180 190 200 

VCRPHTNCSLDGRSVLKTGTTEKDV— VCGPPVVSFSPSTTISVTPEGGPGGHSLflVLTLFLALTSALLLAL 

I II I I II I I I I III llll I 

TAT IFTQCSQAPI ASLLPFYLSSAVSSVCENP — ILQPYRIQQAIAAG ILPLSPLFLQQSSALLQQL 

20 30 40 50 60 70 80 

210 220 230 240 250 X 

IF1TLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I II II 

Di uui i inn I nm m ami AAvcnnnnci dcmqi rci mcacvi nnnni Deem baavdooc 


90 100 110 120 130 X 140 

LPFNQLAALNSPAYLQ0Q8LLPFSQLAGVSPAT 
150 160 170 180 


8. ELLI S-01 2-FIG2AB . PEP (1-256) 

ZEA4J1AIZE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE 19B1). 

ID ZEA4.MAIZE STANDARD; PRTi 234 AA. 

AC P06675; 

DT 01 -JAN-1 9S8 (REL. 06, CREATED) 

DT 01 -JAN-1 988 (REL. 06, LAST SEQUENCE UPDATE) 

DT 01-N0V-1988 (REL. 09, LAST ANNOTATION UPDATE) 

DE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE 19B1). 

OS ZEA NAYS (MAIZE). 

oc eukarydta; planta; enbryophyta; angiospermae; nonocotyledoneae; 

0C CYPERALES; GRANINEAE. 

RN Ell 

RP SEQUENCE FROM N.A. 

RM 86059563 

RA HARKS H.D., LINDELL J.S., LARKINS B.A.; 

RL J. BIOL. CHEM. 260:16451-16459(1985). 

CC -!- FUNCTION: ZEINS ARE MAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 707. OF THE TOTAL 

CC ZEIN FRACTION. THEY ARE ENCODED BY A LARGE MULTIGENE FAMILY. 

CC -!- STRUCTURALLY, 22K AND 19K ZEINS ARE COMPOSED OF NINE ADJACENT, 

CC TOPOLOGICALLY ANTIPARALLEL HELICES CLUSTERED WITHIN A DISTORTED 
CC CYLINDER. 

DR EMBL; M12143; ZMZE19B1 . 

DR PIR; E24557; ZIZMB1. 

KU SEED STORAGE PROTEIN; TANDEM REPEAT; MULTIGENE FAMILY; SIGNAL. 

FT SIGNAL 1 21 

FT CHAIN 22 234 ZEIN-ALPHA. 

SQ SEQUENCE 234 AA; 25435 MW; 271626 CN; 

Initial Score = 10 Optimized Score = 23 Significance = 3.43 

Residue Identity = 227. Matches = 33 Misnatches = 97 

Gaps = 19 Conservative Substitutions = 0 

60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPG9ELTKGGCKTCSLGTFNDQNGTG 

I I II I 

MA AK I FCLLMLLG-LS ASA AT A 
X 10 20 

140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVLKTGTTEKDV— VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

i ii i i ii i i i i iii mi i 

TIFP — QCSQAPIASLLPPYLSSAVSSVCENP — ILQPYRIQQAIAAG I LPLSPLFLQ9SSALLQQL 

30 40 50 60 70 80 

210 220 230 240 250 X 

I F I TLLFSVLKW IRKKFPHI FKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I II II 

PLVHLL — AQNIR AQQLQQLVLANLAAYSQQQQFLPFNQLGSLNSASYLQQQ8LPFSQLPAAYPSQF 

90 100 110 120 130 X 140 

LPFNQLAALNSPAYLQQQQLLPFSQLAGVSPAT 
150 160 170 180 


9. ELLIS-Q12-FIG2AB.PEP (1-256) 

ZEA1_MAIZE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE A30). 

I 


7CA1 MAT7C- 


CTAMnADTV 


DDT » 


OTA AA 


AC P02859; 

DT 21— JUL-19S6 (REL. 01, CREATED) 

DT 13-AUG-1937 (REL. 05, LAST SEQUENCE UPDATE) 

DT 01-AUG-19SB (REL. 08, LAST ANNOTATION UPDATE) 

DE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE A30) . 

OS ZEA MAYS (MAIZE). 

oc eukaryota; planta; embryophyta; angiospermae; monocotyledoneae; 

OC CYPERALES; GRAMINEAE. 

rn m 

RP SEQUENCE FROM N.A. 

RM 82081837 

RA GERAGHTY D., PEIFER M.A., RUBENSTEIN I., MESSING J.; 

RL NUCLEIC ACIDS RES. 9:5163-5174(1981). 

RN 123 

RP SEQUENCE FROM N.A. 

RM 84207882 

RA HU N.T., PEIFER M.A., HEIDECKER G., MESSING J., RUBENSTEIN I.! 

RL EMBO J. 1:1337-1342(1982). 

CC -!- FUNCTION: ZEINS ARE MAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 707. OF THE TOTAL 
CC ZEIN FRACTION. THEY ARE ENCODED BY A LARGE MULTIGENE FAMILY. 

CC -!- STRUCTURALLY, 22K AND 19K ZEINS ARE COMPOSED OF NINE ADJACENT, 

CC TOPOLOGICALLY ANTIPARALLEL HELICES CLUSTERED WITHIN A DISTORTED 

CC CYLINDER. 

DR EHBLJ V01481 ; ZMZEIN. 

DR PIR; C22762; ZIZM3. 

DR PIR: S21970,' S21970. 

KW SEED STORAGE PROTEIN; TANDEN REPEAT: MULTIGENE FAMILY: SIGNAL. 

FT SIGNAL 1 21 

FT CHAIN 22 234 ZEIN-ALPHA. 

SQ SEQUENCE 234 AA: 25403 MW: 260041 CN; 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optimized Score = 23 Significance = 3.43 
227. Matches = 33 Mismatches = 97 
19 Conservative Substitutions = 0 


60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGGELTKQGCKTCSLGTFNDQNGTG 


MAAKIFCLLHLLG-LSASAATA 
X 10 20 


140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVLKTGTTEKDV — VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

I II I I II I I I I III till I 

TIFP— QCSGAPIASLLPPYLSPAVSSVCENP — ILQPYRIQQAIAAG ILPLSPLFLQQSSALLQQL 

30 40 50 60 70 80 


210 220 230 240 250 X 

IFITLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I II II 

PLVHLL — AQNIR AQQLQQLVLANLAAYSQQQQFLPFNQLAALNSASYLQQQQLPFSQLPAAYPQQF 

90 100 110 120 130 X 140 

LPFNQLAALNSPAYLQQQQLLPFSQLAGVSPAT 
150 160 170 180 


10. ELLIS-012-FIG2AB.PEP (1-256! 

ZEAC.MAIZE ZEIN-ALPHA PRECURSOR (19 KD) (PMS1). 


ID 

ZEAC_MAIZE 

STANDARD: 

prt; 

235 AA. 

AC 

P24449; 





DT 

01— MAR— 1992 

(REL. 

21, CREATED) 


DT 

01-MAR— 1992 

(REL. 

21, LAST 

SEQUENCE 

UPDATE) 

nt 

ni -Map-1005 

ibfi 

01 , 1 ACT 

AMWOT ATT flM IIPHATO 


DE ZEIN-ALPHA PRECURSOR (19 KD) (PMS1). 

GN ZMPMS1. 

OS ZEA HAYS (HA1ZE) . 

oc eukaryota; plantaj ehbryophyta; angiospermae; nonocot yledoneae; 

oc cyperales; grahineae. 

RN Cl) 

RP sequence from n.a. 

RC STRAIN=CV. A619; 

RM 90060774 

RA QUAYLE T.J.A., BROWN J.W.S., FEIX G.I 
RL GENE 80:249-257(1989). 

CC -!- FUNCTION: ZE1NS ARE HAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 70X OF THE TOTAL 

CC ZEIN FRACTION. THEY ARE ENCODED BY A LARGE MULTIGENE FAMILY. 

CC -!- STRUCTURALLY, 22K AND 19K ZEINS ARE COMPOSED OF NINE ADJACENT, 

CC TOPOLOGICALLY ANTIPARALLEL HELICES CLUSTERED WITHIN A DISTORTED 
CC CYLINDER. 

DR EHBL; X53582; ZMPMS1G. 

DR PIR; SI 5655,' S15655. 

KU SEED STORAGE PROTEIN; TANDEM REPEAT; MULTIGENE FAMILY; SIGNAL. 

FT SIGNAL 1 21 BY SIMILARITY. 

FT CHAIN 22 235 ZEIN-ALPHA. 

SQ SEQUENCE 235 AA; 25505 MW; 262683 CN; 

Initial Score = 10 Optimized Score = 23 Significance = 3.43 

Residue Identity = 227. Matches = 33 Misnatches = 97 

Gaps = 19 Conservative Substitutions = 0 

60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 

I I II I 

MAAKIFCLLMLLG-LSASAATA 
X 10 20 

140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVLKTGTTEKDV— VCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

i ii i i mi i i i iii mi i 

T I FP — 8CSQ AP I ASLLPP YLSP A VSS VCENP — IL6PYRIQQAIAAG I LPLSPLFLQQSSALLSQL 

30 40 50 60 70 80 


210 220 230 240 250 X 

IFITLLFSVLKWIRKKFPHIFKQPFKKTTGAAGEEDACSCRCPQEEEGGGGGYEL 

II II I II II 

PLVHLL — AQNIR AQQLQQLVLANVAAYSQQQ8FLPFNQLAALNSAAYLQQQQLLPFSQLTAAYPQQ 

90 100 110 120 130 X 140 

FLPFNQLAALNSAAYLQQQQLLPFSQLAVVSPA 
150 160 170 180 


11. ELLIS-012-FIG2AB.PEP (1-256) 

ZEA2.MAIZE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE ZG99) . 

ID ZEA2J1AIZE STANDARD; PRT; 235 AA. 

AC P04704; 

DT 1 3— AUG- 1987 (REL. 05, CREATED) 

DT 13-AUG-1987 (REL. 05, LAST SEQUENCE UPDATE) 

DT 0 1 -NOV- 1 988 (REL. 09, LAST ANNOTATION UPDATE) 

DE ZEIN-ALPHA PRECURSOR (19 KD) (CLONE ZG99) . 

OS ZEA MAYS (MAIZE). 

OC EUKARYOTA; PLANTA; EHBRYOPHYTA! ANGIOSPERHAE; HONOCOTYLEDONEAE ; 

OC CYPERALES; GRAHINEAE. 

RN m 

RP SEQUENCE FROM N.A. 

RM 82265740 

Pi WAPKC M n , I Api/TMC R A ■ 



RL J. BIOL. CHEM. 257:9976-9983(1982). 

RN E 2 3 

RP SEQUENCE FROM N. A. 

RM 83103094 

RA PEDERSEN K.> DEVEREUX J., WILSON D.R., SHELDON E., LARKINS B.A.J 
RL CELL 29:1015-1026(1982). 

CC -!- FUNCTION: ZEINS ARE MAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 707. OF THE TOTAL 
CC ZEIN FRACTION. THEY ARE ENCODED BY A LARGE MULTIGENE FAMILY. 

CC -!- STRUCTURALLY, 22K AND 19K ZEINS ARE COMPOSED OF NINE ADJACENT, 

CC TOPOLOGICALLY ANTIPARALLEL HELICES CLUSTERED WITHIN A DISTORTED 

CC CYLINDER. 

DR ENBL; V01470: ZMZE01 . 

DR EMBL, V01479! ZMZE10. 

DR PIR! A29288: ZIZM99. 

KW SEED STORAGE PROTEIN: TANDEM REPEAT: MULTIGENE FAMILY: SIGNAL. 

FT SIGNAL 1 21 

FT CHAIN 22 235 ZEIN-ALPHA. 

SQ SEQUENCE 235 AA; 25575 MW: 261593 CN: 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optimized Score = 22 Significance = 3.43 

217. Matches = 32 Mismatches = 98 

19 Conservative Substitutions = 0 


60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 


MAAKIFCLIMLLG-LSASAATA 
X 10 20 

140 150 160 170 180 190 200 

VCRPWTNCSLDGRSVL— KTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSL8VLTLFLALTSALLLAL 


SIFP— OCSfiAP I ASLLPPYLSPAHSSVCENP — ILLPYRIQQAIAAG- 
30 40 50 60 


-ILPLSPLFL6QSSALLQQL 
70 80 


210 220 230 240 250 X 

IFI TLLFSVLKW I RKKFPHIFKQPFKHTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II I! I II II 

PLVHLL — AQNIR AQQLQQLVLANLAAYSQQQQFLPFNQLAALNSAAYLQQQQLLPFSQLAAAYPRQ 

90 100 110 120 130 X 140 


FLPFNQLAALNSHAYVGQQQLLPFSQLAAVSPA 
150 160 170 180 


12. ELLIS-012-FIG2AB.PEP (1-256) 

ZEAL.HAIZE ZEIN-ALPHA PRECURSOR (CLONE Z4) . 

ID ZEAL.MAIZE STANDARD; PRT; 253 AA. 

AC P04701; 

DT 13-AUG-1987 (REL. 05, CREATED) 

DT 13-AUG-1987 (REL. 05, LAST SEQUENCE UPDATE) 

DT 01— AUG— 1988 (REL. 08, LAST ANNOTATION UPDATE) 

DE ZEIN-ALPHA PRECURSOR (CLONE Z4). 

OS ZEA MAYS (MAIZE) . 

oc eukaryota; planta: embryophyta; angiospermae; monocotyledoneae; 

oc cyperales; gramineae. 

RN Cl) 

RP SEQUENCE FROM N.A. 

RM 84207882 

RA HU N.T. , PEIFER M.A., HEI DECKER G., MESSING J., RUBENSTEIN I.’, 

RL EMBO J. 1:1337-1342(1982). 

CC -!- FUNCTION: ZEINS ARE MAJOR SEED STORAGE PROTEINS. 

CC -!- THE ALPHA ZEINS OF 19 KD AND 22 KD ACCOUNT FOR 707. OF THE TOTAL 

CC 7CTM coArTtnM tucv adc cw^nncn dv a i Aorr mih TtrcMc camti v 



DR EMBL; V01472; ZMZEQ3. 

KW SEED STORAGE PROTEIN? TANDEM REPEAT; MULTIGENE FAMILY; SIGNAL. 

FT SIGNAL 1 21 

FT CHAIN 22 253 ZEIN-ALPHA. 

SQ SEQUENCE 253 AA; 27700 MW; 300631 CN; 


Initial Score = 
Residue Identity = 
Gaps = 


10 Optimized Score = 21 Significance = 3.43 
19X Matches = 29 Mismatches = 105 
15 Conservative Substitutions = 0 


60 70 80 90 100 110 120 130 

NCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTG 


MAAKIFCLIMLLG-LSASAATA 
X 10 20 


140 150 160 170 ISO 190 200 

VCRPWTNCSLDGRSVL-- KTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSLQVLTLFLALTSALLLAL 

I II I II I I I I III till I 

SIFP — SCSQAPIASLLPPYLSPAMSSVCENP — ILLPYRIQ8AIAAG I LPLSPLFLQQSSALLQQL 

30 40 50 60 70 80 


210 220 230 240 250 X 

IF1TLLFSVLKWIRKKFPHIFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I I 

PLVHLL — AQNIRAQQLQQLVLANLAAYS8GGQLPLVHLLAQNIRAQQLQQLVLANLAAYSQ8QQFLPFNQ 
90 100 110 120 130 X 140 150 

LAALNSAAYLQQQQLLPFSQLAAAYPRQFLPFN 
160 170 180 


13. ELLIS-012-FIG2AB.PEP (1-256) 

MO34_M0USE M0V34 PROTEIN. 

ID M034JI0USE STANDARD; PRT; 321 AA. 

AC P26516; 

DT 01 —AUG— 1992 (REL. 23, CREATED) 

DT 01-APR-1993 (REL. 25, LAST SEQUENCE UPDATE) 

DT 01 -APR-1993 (REL. 25, LAST ANNOTATION UPDATE) 

DE MQV34 PROTEIN. 

GN HOV-34. 

OS MUS MUSCULUS (MOUSE). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 
oc eutheria; rodentia. 

RN [13 

rp sequence from N.A. 

RM 91005870 

RA GRIDLEY T., GRAY D.A., ORR-WEAVER T., SORIANO P., BARTON D.E., 

RA FRANCKE U., JAENISCH R.; 

RL DEVELOPMENT 109:235-242(1990). 

RN [23 

RP SEQUENCE FROM N.A. 

RM 92128931 

RA GRIDLEY T., JAENISCH R., GENDRON-MAGUIRE M.! 

RL GENOMICS 11:501-507(1991). 

CC -!- FUNCTION: MAY PLAY AN IMPORTANT ROLE IN EARLY DEVELOPMENT. 

CC -!- DISEASE: DISRUPTION OF THE MOV-34 LOCUS IS A RECESSIVE EMBRYONIC 
CC LETHAL MUTATION. 

CC -!- SIMILARITY: 627. IDENTITY TO DROSOPHILA M0V34 PROTEIN. 

DR EMBL; M64641 ; MMM0V34. 

DR EMBL; 1164634; MMM0V341. 

DR EMBL; M64635J MHM0V342. 

DR EMBL; M64636; MMM0V343. 

DR EMBL; M64637; MMM0V344. 

np CMPI : MAilATO ! MMMfnnflA 


DR EMBL; M64640; MNM0V347. 

DR PIR; A40556; BHNSV4. 

FT DOMAIN 283 321 HYDROPHILIC. 

SQ SEQUENCE 321 AA; 36540 MW; 520650 CN; 


Initial Score •= 

10 

Gptifiized Score = 38 

Significance = 

3.43 

Residue Identity = 

177. 

Matches = 46 

Misnatches = 

202 

Gaps = 

12 

Conservative Substitutions 

= 

0 


X 10 20 30 40 

MGNNCYNVVVIVLLLVGCEKVGAVQNSCDNCQPGTFCRKYNPVC 


MPELAVQKVVVHPLVLLSVVDHFNRIGKVGN--QKRVVGVLLGSWQKKVLDVSNS--FAVPFDEDDKDDSVW 
10 20 30 40 50 60 

50 60 70 80 90 100 110 

KSCPPSTFSSIGGQPNCN-ICRVCAGYFRFKK-FCSSTHNAECECIEGFHCLGPQCTRCEKDCRPGQELTKQ 

I I 1 I I I II I 

FLDHDYLENHYGMFKKVNARERI VGWYHTGPKLHKNDI AINELMKRYCPNSVLV I I DVKPKDLGLPTEAYIS 
70 80 90 100 110 120 130 140 


120 130 140 150 160 170 180 

GCKTCSLGTFNDQNGTGVCRPWTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTTISVTPEGGPGGHSL-Q 

II I I II I I I I I I 

VEEVHDDGTPTSKTFEHVTSEIGAEEAEEVGVEHLLRDIKD- — TTVGTLSSR I TNQVHGLKGLNSKLLDI 
150 160 170 180 190 200 

190 200 210 220 230 240 250 X 

VLTLFLALTSALLLALIFITLLFSVLKWIRKKFPHIF-KQPFKKTTGAAGEEDACSCRCPQEEEGGGGGYEL 

I I I I I I I II I 

RSYLEKVASGKLPINHQIIYQLQDVFNLLPDASLQEFVKAFYLKTNDQMWVYLASLIRSVVALHNLINNKI 
210 220 230 240 250 260 270 280 

ANRDAEKKEG8EKEESKKERKDDKEKEKSDAAKKEEKKEKK 
290 300 310 320 


14. ELL IS-01 2-FIG2AB -PEP (1-256) 

ATPB.SULAC MEMBRANE-ASSOCIATED ATPASE BETA CHAIN (EC 3.6.1.34 

ID ATPBJULAC STANDARD; PRT; 465 AA. 

AC PI 3052; 

DT 01-JAN-1990 (REL. 13, CREATED) 

DT 01-JAN-1990 (REL. 13, LAST SEQUENCE UPDATE) 

DT 01-MAY-1992 (REL. 22, LAST ANNOTATION UPDATE) 

DE MEMBRANE-ASSOCIATED ATPASE BETA CHAIN (EC 3.6.1.34) (SUL-ATPASE BETA). 
GN ATPB. 

OS SULFOLOBUS ACIDQCALDARIUS. 

OC PROKARYOTA! nendosicutes; ARCHAEBACTERIA; SULFOLOBALES. 
rn m 

RP SEQUENCE FRON N.A. 

RM 89034240 

RA DENDA K., KONISHI U., OSHIMA T., DATE T., YOSHIDA M.; 

RL J. BIOL. CHEM. 263:17251-17254(1988). 

CC -!- THIS IS A REGULATORY SUBUNIT. 

CC -!- SUBUNIT: SUL-ATPASE IS COMPOSED OF SIX (OR FIVE ?) SUBUNITS: 

CC ALPHA, BETA, DELTA, GAMMA, C (PROTEOLIPID) , AND POSSIBLY EPSILON. 
CC -!- SIMILARITY: STRONG TO OTHER ARCHEBACTERIA BETA SUBUNITS, ALSO 
CC RELATED TO THE ALPHA SUBUNITS OF F0-F1 ATPASES. 

DR EMBL; M22402; SAATPB. 

DR PIR; A32118,’ A32118. 

DR PROSITE; PS00152; ATPASE_ALPHA_BETA. 

KW HYDROLASE; HYDROGEN ION TRANSPORT. 

SQ SEQUENCE 465 AA; 51247 MW; 1080510 CN; 


Residue Identity = Ml Matches = 49 Hisnatches = 204 

Gaps = 21 Conservative Substitutions = 0 

X 10 20 30 

MGNNCYNVVVIVLLLVGCEKV--GAVQNSCDNC 

I II I I I I 

MSLLNVRE YSN I SM I KGPL I AVQGVSD A A YNELVE I EMPDGSKRRGLV VDSeMGVTFVfiVFEGTTG I SPTGS 
10 20 30 40 50 60 70 

40 50 60 70 80 90 100 

QPGTFCRKYNPVCKSCPPSTFSSIGGQPNCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGPQCTRCEK 

I I I I I I I I I I I I 

K VRFLGRGLE VK I SEEMLGR Z FNPLGEPLDNGPP V I GGEKR-N I NGDP I NP ATRE YPEEF I QTG I SA I DGLN 


80 

90 

100 

110 

120 

130 

140 

110 

120 

130 

140 

150 

160 

170 


DCRPGQELTKQGCKTCSLGTFND9 — NGTGVCRPHTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTTISV 

I I I I I I I 

SLLRGSKITDLSGSGLPANTLAAQIAKQATVRG'EESNFAVVFAAIGVRYDEALFFRKFFEETGAINRVAMFV 


150 

160 

170 

180 

190 

200 

210 

180 

190 

200 


210 

220 

230 


TPEGGPGGHSLQVLTLFLALTSALLLA LIFIT — LLFSVLKWIRKKFPHIFKQP-FKKTTG 

I I II II III I II II I III II 

TL ANDP — PSLK I LTPKT AL TLAE YL AFEKDHH VL A I L I DHTN YCE ALREL5 ASREE VPGRGG YPG YHYTDL 
220 230 240 250 260 270 280 

240 250 X 

AA9EEDACSCRCPQEEEGGGGGYEL 

I I I 

AT I YER AGKVI GKKGS I T0HP I LTHPNDDHTHP I PDLTGY I TEGQI VLDRSLFNKG I YPPI NVLNSLSRLHK 
290 300 310 320 330 340 350 

DGI 

360 


15. ELLIS-012-FIG2AB.PEP (1-256) 

TENA.CHICK TENASCIN PRECURSOR (TN) (HEXABRACHION) (CYTOTACTIN 

ID TENA_CHICK STANDARD.' PRT; 1808 AA. 

AC P10039! P13132; 

DT 01 -MAR-1 989 (REL. 10. CREATED) 

DT 01-MAR— 1992 (REL. 21, LAST SEQUENCE UPDATE) 

DT 01-AUG-1992 (REL. 23, LAST ANNOTATION UPDATE) 

DE TENASCIN PRECURSOR (TN) (HEXABRACHION) (CYTOTACTIN) (NEURONECTIN) 

DE (GMEM) (JI) (MIOTENDINOUS ANTIGEN) (GLIOMA-ASSOCIATED-EXTRACELLULAR 

DE MATRIX ANTIGEN) (GP 150-225). 

OS GALLUS GALLUS (CHICKEN) . 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; aves; neognathae; 
oc GALLIFORMES. 

RN 113 

RP SEQUENCE FROM N.A. 

rc tissue=embryo; 

RM 90030407 

ra spring j., beck k., chiquet-ehrismann r.; 

RL CELL 591325-334(1989) , 

RN [23 

RP SEQUENCE OF 27-722 FROM N.A., AND SEQUENCE OF 79-96. 

RC TISSUE=FIBROBLAST; 

RM 89030589 

RA PEARSON C.A., PEARSON D., SHIBAHARA S., HOFSTEENGE J., 

RA CHIQUET-EHRISMANN R.; 

RL EMBO J. 7:2977-2982(1988). 

RN [33 

op ccniiDMrc nr Ahin iih.uh conH m a 


RM 88176910 

RA JONES F.S., BURGOON M.P., HOFFNAN S., CROSSIN K.L., CUNNINGHAM B.A., 

RA EDELMAN G.M.: 

RL PROC. NATL. ACAD. SCI. U.S.A. 85:2186-2190(1988). 

CC -!- FUNCTION: SAM (SUBSTRATE-ADHESION MOLECULE) THAT APPEARS TO 
CC INHIBIT CELL MIGRATION. MAY PLAY A ROLE IN SUPPORTING THE GROWTH 

CC OF EPITHELIAL TUMORS. 

CC -!- SUBUNIT: HEXAMERIC. AN HOMOTRIMER MAY BE FORMED IN THE TRIPLE 
CC COI LED-CO I L REGION AND MAY BE STABILIZED BY DISULFIDE RINGS AT 

CC BOTH ENDS. TWO OF SUCH HALF-HEXABRACHIONS MAY BE DISULFIDE LINKED 
CC WITHIN THE CENTRAL GLOBULE. 

CC -!- INDUCTION: BY TGF-BETA. 

CC -!- SUBCELLULAR LOCATION: EXTRACELLULAR MATRIX. 

CC -!- ALTERNATIVE SPLICING: THREE VARIANTS OF 230 KD, 200 KD, AND 190 KD 

CC ARE PRODUCED FROM A SINGLE GENE IN A TISSUE- AND TIME-SPECIFIC 
CC MANNER DURING DEVELOPMENT. 

CC -!- SIMILARITY: INCLUDES 13.5 EGF-LIKE REPEATS AND 11 FIBRONECTIN 
CC TYPE III -LIKE DOMAINS. 

DR EMBL! M23121I GGTEN. 

DR EMBL: X08031 T GGTENAS1 . 

DR EMBL: X08030I GGTENAS8. 

DR EMBL: J03641; GGCYTT1 . 

DR EMBL: H20816: GGCYTT2. 

DR PIR; A30903: A30903. 

DR PIR; A31930: A31930. 

DR PIR; A33379; A33379. 

DR PIR; B33379; B33379. 

DR PIR; C33379: C33379. 

DR PIR; SO 1292; S01292. 

DR PROSITE; PS00022! EGF. 

KW GLYCOPROTEIN: CELL ADHESION; TANDEM REPEAT; EGF-LIKE DOMAIN; 

KW EXTRACELLULAR MATRIX; SIGNAL! ALTERNATIVE SPLICING. 


FT 

SIGNAL 

1 

22 


FT 

PROPEP 

23 

33 


FT 

CHAIN 

34 

1808 

TENASCIN. 

FT 

DOMAIN 

119 

147 

4 HEPTAD REPEATS (PROBABLE COILED COIL) 

FT 

DOMAIN 

176 

590 

13.5 EGF-TYPE REPEATS. 

FT 

REPEAT 

176 

187 

EGF-LIKE 0 (PARTIAL). 

FT 

REPEAT 

187 

218 

EGF-LIKE 1. 

FT 

REPEAT 

219 

249 

EGF-LIKE 2. 

FT 

REPEAT 

250 

280 

EGF-LIKE 3. 

FT 

REPEAT 

281 

311 

EGF-LIKE 4. 

FT 

REPEAT 

312 

342 

EGF-LIKE 5. 

FT 

REPEAT 

343 

373 

EGF-LIKE 6. 

FT 

REPEAT 

374 

404 

EGF-LIKE 7. 

FT 

REPEAT 

405 

435 

EGF-LIKE 8. 

FT 

REPEAT 

436 

466 

EGF-LIKE 9. 

FT 

REPEAT 

467 

497 

EGF-LIKE 10. 

FT 

REPEAT 

498 

528 

EGF-LIKE 11. 

FT 

REPEAT 

529 

559 

EGF-LIKE 12. 

FT 

REPEAT 

560 

590 

EGF-LIKE 13. 

FT 

DOMAIN 

591 

680 

FIBRONECTIN TYPE-III 1. 

FT 

DOMAIN 

631 

771 

FIBRONECTIN TYPE-III 2. 

FT 

DOMAIN 

772 

862 

FIBRONECTIN TYPE-III 3. 

FT 

DOMAIN 

863 

954 

FIBRONECTIN TYPE-III 4. 

FT 

DOMAIN 

955 

1042 

FIBRONECTIN TYPE-III 5. 

FT 

DOMAIN 

1043 

1133 

FIBRONECTIN TYPE-III 6. 

FT 

DOMAIN 

1134 

1224 

FIBRONECTIN TYPE-III 7. 

FT 

DOMAIN 

1225 

1315 

FIBRONECTIN TYPE-III 8. 

FT 

DOMAIN 

1316 

1404 

FIBRONECTIN TYPE-III 9. 

FT 

DOMAIN 

1405 

1492 

FIBRONECTIN TYPE-III 10. 

FT 

DOMAIN 

1493 

1580 

FIBRONECTIN TYPE-III 11. 

FT 

SIMILAR 

1589 

1808 

TO THE GLOBULAR DOMAIN OF THE BETA- AND 

FT 




GAMMA-CHAINS OF FIBRINOGEN. 

FT 

VARSPLIC 

1043 

1224 

MISSING (IN 200 KD FORM). 


CT UADCD! T r 


win 


nn 


MTCOTMr mi ion i/n cnoMi 


FT 

DISULFID 

64 

64 

INTERCHAIN (POTENTIAL). 

FT 

CARBOHYD 

38 

38 

POTENTIAL. 

FT 

CARBOHYD 

168 

168 

POTENTIAL. 

FT 

CARBOHYD 

186 

186 

POTENTIAL. 

FT 

CARBOHYD 

328 

328 

POTENTIAL. 

FT 

CARBOHYD 

603 

603 

POTENTIAL. 

FT 

CARBOHYD 

643 

643 

POTENTIAL. 

FT 

CARBOHYD 

751 

751 

POTENTIAL. 

FT 

CARBOHYD 

759 

759 

POTENTIAL. 

FT 

CARBOHYD 

1050 

1050 

POTENTIAL. 

FT 

CARBOHYD 

1090 

1090 

POTENTIAL. 

FT 

CARBOHYD 

1101 

1101 

POTENTIAL. 

FT 

CARBOHYD 

1112 

1112 

POTENTIAL. 

FT 

CARBOHYD 

1153 

1153 

POTENTIAL. 

FT 

CARBOHYD 

1183 

1183 

POTENTIAL. 

FT 

CARBOHYD 

1416 

1416 

POTENTIAL. 

FT 

CARBOHYD 

1736 

1736 

POTENTIAL. 

FT 

CARBOHYD 

1769 

1769 

POTENTIAL. 

FT 

CONFLICT 

563 

571 

SCPNDCNNV -> PAPMTATTW (IN REF. 3) 

FT 

CONFLICT 

598 

598 

E -> G (IN REF. 3). 

FT 

CONFLICT 

840 

840 

Y -> YEY (IN REF. 3). 

SO 

SEQUENCE ' 

1808 

AA; 198858 

HH; 1.656738E+07 CN? 


Initial Score = 10 Optimized Score = 36 Significance = 3.43 

Residue Identity = 177. Hatches = 48 Mismatches = 202 

Gaps = 24 Conservative Substitutions = 0 

X 10 20 

HGNNCYNVVVIVLLLVGCEKVG 

I I I 

DCFDRGRCINGTCFCEEGYTGEDCGELTCPNNCNGNGRCENGLCVCHEGFVGDDCSQKRCPKDCNNRGHCV- 
320 330 340 350 360 370 380 

30 40 50 60 70 80 

AV0NSCDNCQPGTFCRKYNPVCKSCPPSTFSSIGG9PNC — NICRVCAGYFRFKKFC — SSTHNAECECI 

I II I I II I I II I I I III 

DGRCVCHEGYLGEDC — GELRCPNDCHNRGRCINGQCVCDEGFIGEDC-GELRCPNDCHNRGRCVNGQCECH 
390 400 410 420 430 440 450 

90 100 110 120 130 140 150 

EGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGTFNDQNGTGVCRPUTN — CSLDGRSVLKTGTTEKDVVC 

III II II I II I II I I I I 

EGFIGEDCGELRCPNDCNSHGRCVNGQCVCDEGYTGEDCGELRCPNDCHNRGRCVEGRCVCDNGFMGED--C 


460 470 480 490 

500 

510 

520 


160 170 180 

190 

200 

210 

220 


G PPVVSFSPSTTISVTPEGGPG GH5LQVLTLFLALTSALLLALIFITLLFSVLKHIRKKFPH 


GELSCPNDCHQHGRCVDGRCVCHEGFTGEDCRERSCPNDCNNVGRCVEGRCVCEEGYHGIDCSDVSPPTELT 
530 540 550 560 570 580 590 600 

230 240 250 X 

IFKQPFKKTTGAAQEEDACSCRCPQEEEGGGGGYEL 

II II I 

VTNVTDKTVNLENKHENLVNEYLVTYVPTSSGGLDLQFTVPGNQTSATIHELEPGVEYFIRVFAILKNKKSI 
610 620 630 X 640 650 660 670 

PVSARVATYLPAPE 

680 



> 0 < 

0| |0 Intell iGenetics 

> 0 < 

FastDB - Fast Pair-uise Comparison of Sequences 
Release 5.4 

Results file ellis-012-f ig2ab.res made by shears on Tue 14 Sep 93 18:08:07-PDT. 


Query sequence being compared.'ELLIS-012-FlG2AB.SEQ (1-2350) 

Number of sequences searched; 144007 

Number of scores above cutoff: 3862 

Results of the initial comparison of ELLIS-Q12-FIG2AB.SEQ (1-2350) with: 
Data bank : EHBL-NEW 7> all entries 
Data bank : GenBank 77 r all entries 
Data bank : GenBank-NEW 6> all entries 
Data bank : UENBL 35_77i all entries 

100000 # 

N -# 

U50000- 

H 

B 

E 

R 

0 

F 10000- 

s 

E 5000- 

Q 

U 

E - # 

N 

C 

E 

S 1000- 


500- 


- * 
100 - 


50 - 


10- 


5- 


i 


# 

0 

iiiii i i i i i i i i 

SCORE 0|| ||261 522 7S3 1044 1305 1566 1827 2088 

STDEV 8 


PARAMETERS 


Similarity matrix 

Unitary 

K-tuple 

4 

Mismatch penalty 

1 

Joining penalty 

30 

Gap penalty 

1.00 

Window size 

32 

Gap size penalty 

0.33 



Cutoff score 

10 



Randomization group 

0 



Initial scores to save 40 

Alignments to save 

15 

Optimized scores to 

save 0 

Display context 

50 


SEARCH STATISTICS 


Scores: 

Mean 

Median Standard 

Deviation 


36 

36 14.60 


Times: 

CPU 

Total Elapsed 


02:07:00.01 

03:17:05, 

.00 

Number of residues: 


169341811 


Number of sequences 

searched: 

144007 


Number of scores above cutoff: 

3862 



Cut-off raised to 30. 

Cut-off raised to 35. 

Cut-off raised to 39. 

Cut-off raised to 43. 

Cut-off raised to 47. 

Cut-off raised to 50. 

Cut-off raised to 53. 

Cut-off raised to 56. 

Cut-off raised to 59. 

Cut-off raised to 62. 

Cut-off raised to 65. 

Cut-off raised to 68. 

The scores below are sorted by initial score. 

Significance is calculated based on initial score. 

A 100X identical sequence to the query sequence was not found. 


The list of best scores is: 




2349 


Init. Opt. 


#### 158 standard deviations above Bean #### 


1 . 

MUSTC41BB 

Mouse T-cell receptor 4-1BB p 2350 2349 i 
#**8 25 standard deviations above Bean #### 

2349 

158.43 

0 

2. 

HUMILAX 

Human activation dependent T 1419 412 

###* 8 standard deviations above nean #88* 

806 

25.75 

0 

3. 

CBRR5A 

Caenorhabditis briggsae DNA f 

944 

163 

406 

8.70 

0 

4. 

CLS88D0 

Hanster EcoRI donor DNA fragm 

3906 

162 

1019 

8.63 

0 

5. 

XELAEIP 

X.laevis amidating enzyme (AE 2733 157 

*8*8 7 standard deviations above mean #8*8 

960 

8.29 

0 

6. 

HUMUT5094 

Human chromosome 4 STS UT5094 

468 

152 

210 

7.95 

0 

7. 

S53907 

XRAR alpha 2=retinoic acid re 

3240 

151 

961 

7.88 

0 

8. 

PFASXC 

Plasmodium falciparum sexual 

2306 

150 

808 

7.81 

0 

9. 

ACLRGNAL 

A.lsidlauii 16S ribosomal RNA 

1508 

146 

568 

7.53 

0 

10. 

HUMBIND 

Human binding protein mRNA, p 

3523 

145 

772 

7.47 

0 

11. 

HSBIND 

Human binding protein mRNA, p 

3523 

145 

772 

7.47 

0 

12. 

HSHB15RNA 

Homo sapiens mRNA for HB15 

1761 

143 

697 

7.33 

0 

13. 

S53354 

B-cell activation protein=B-G 

2574 

143 

897 

7.33 

0 

14. 

HSIL05 

Human interleukin-2 (IL-2) ge 

6684 

142 

737 

7.26 

0 

15. 

RATTGFB 

Rat transforming growth facto 6244 141 

*#8s 6 standard deviations above mean #«8* 

993 

7.19 

0 

16. 

PFAHRKPM 

P.cynomolgi DNA homologous to 

1563 

136 

636 

6.85 

0 

17. 

HUMPALF1 

Human mutant prealbumin gene 

1913 

136 

798 

6.85 

0 

18. 

HUMAMYLOID 

Homo sapiens amyloid protein 

3725 

136 

757 

6.85 

0 

19. 

HUMPALD 

Human prealbumin gene, comple 

7616 

136 

944 

6.85 

0 

20. 

AMVCP 

Arabis mosaic virus RNA-2, 3' 

2406 

135 

918 

6.78 

0 

21. 

HUMPALC 

Human serum prealbumin gene. 

7619 

135 

945 

6.78 

0 

22. 

SCCHRIII 

S.cerevisiae chromosome III c 

315338 

133 

975 

6.64 

0 

23. 

DROFATFA 

Fruitfly fat facets mRNA. 

8473 

131 

996 

6.51 

0 

24. 

DROFATFB 

Fruitfly fat facets mRNA. 

8891 

131 

996 

6.51 

0 

25. 

PIGFSHB 

Pig follicle stimulating horn 

929 

130 

398 

6.44 

0 

26. 

ATGRPG 

A.thaliana genes encoding gly 

9619 

130 

962 

6.44 

0 

27. 

MMUPA 

M. musculus upstream region of 

4431 

129 

879 

6.37 

0 

28. 

MMGCSF 

Mouse granulocyte colony-stim 

1363 

127 

587 

6.23 

0 

29. 

0CPMA1 

O.cuniculus PMCA1 gene for pi 

4479 

126 

800 

6.16 

0 

30. 

S56304S1 

AADC=aromatic L-amino acid de 

1314 

125 

550 

6.10 

0 

31. 

STAPT48CG 

Plasmid pT48 (from S. aureus) 

2475 

125 

713 

6.10 

0 

32. 

RATOLFPROL 

Rat olfactory protein mRNA, c 

984 

124 

394 

6.03 

0 

33. 

MMUPAACT 

Mouse gene for urokinase plas 

986 

124 

426 

6.03 

0 

34. 

HUMHTF4 

Human helix-loop-helix protei 

2942 

124 

897 

6.03 

0 

35. 

MUSFABPI 

Mouse Fabpi gene, exons 1-4. 5039 124 

#88# 5 standard deviations above nean ###» 

843 

6.03 

0 

36. 

CEHER1GNA 

C.elegans her-1 gene 

6932 

123 

972 

5.96 

0 

37. 

YSCMTAT92 

yeast (s.cerevisiae) mitochon 

365 

121 

179 

5.82 

0 

38. 

M75767 

CEL02A3S2 Caenorhabditis eleg 

388 

121 

164 

5.82 

0 

39. 

SCSPP91A 

S.cerevisiae SPP91 gene 

1665 

121 

698 

5.82 

0 

40. 

YSCPRP21A 

Saccharomyces cerevisiae nucl 

2180 

121 

709 

5.82 

0 


1. ELLIS-0 1 2-FIG2AB . SEQ (1-2350) 

MUSTC41BB Mouse T-cell receptor 4- 1 BB protein mRNA, complete 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

QtAMnipn 


MUSTC41BB 2350 bp ss-nRNA ROD 15-SEP-1989 

Mouse T-cell receptor 4-1 BB protein bRNA. complete cds. 

J04492 

T-cell receptor. 

Mouse (strain C57BL/6) T-lymphocyte cell lines L2 and L3, cDNA to 
mRNA. 

Mus musculus 

Eukaryota; Animalia,' Chordata; Vertebrata! Mammalia; Theria! 
Eutheria! Rodentia! Myomorpha! Huridae! Murinae. 

1 (bases 1 to 2350) 

KuoniB.S. and WeissmaniS.M. 

cDNA sequences of two inducible T-cell genes 

Proc. Natl. Acad. Sci. U.S.A. 86, 1963-1967 (1989) 

■Pill 1 atilrtffl:! ir 



COMMENT 


Draft entry and clean copy of sequence for [1] kindly provided by 
B.S.Kuon, 17-MAR-1989. 

Location/Qualifiers 

146. . 214 
/codon_start=l 

/note=“4-lBB protein signal peptide 0 

215. . 913 
/codon_start=l 
/note=“4-lBB protein" 

146. . 916 

/note=“4-lBB protein precursor" 

7codon_start=l 

/transiation=“MGNNCYNVVVIVLLLVGCEKVGAV9NSCDNC9PGTFCRKYNPVC 
KSCPPSTFSSIGGQPNCNICRVCAGYFRFKKFCSSTHNAECECIEGFHCLGP9CTRCE 
KDCRPGQELTKQGCKTCSLGTFNDGNGTGVCRPWTNCSLDGRSVLKTGTTEKDVVCGP 
PVVSFSPSTTISVTPEGGPGGHSL9VLTLFLALTSALLLALIFITLLFSVLKHIRKKF 
PHIFK9PFKKTTGAA9EEDACSCRCPQEEEGGGGGYEL" 

BASE COUNT 590 a 561 c 589 g 607 t 3 others 
ORIGIN Unreported. 


Initial 

Score = 

2349 

Optimized Score = 2349 

Significance = 158.43 

Residue 

Identity = 

99X 

Matches = 2349 

Mismatches = 1 

Gaps 

= 

0 

Conservative Substitutions 

= 0 

X 

10 

20 

30 40 50 60 70 


ATGTCCATGAACTGCTGAGTGGATAAACAGCACGGGATATCTCTGTCTAAAGGAATATTACTACACCAGGAA 

Ml I II 111 ! M ! I M ! I ! ! ill ! 1 1 1 ! ! 1 ! ! i IN I ! ! I ! I ! 1 1 1 1 ! ! i ! I II I II i 1 ! 1 1 j 1 1 ! I i I II I 
ATGTCCATGAACTGCTGAGTGGATAAACAGCACGGGATATCTCTGTCTAAAGGAATATTACTACACCAGGAA 
X 10 20 30 40 50 60 70 


FEATURES 

sig_peptide 

mat_peptide 

CDS 


80 90 100 110 120 130 140 

AAGGACACATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGC 

1 1 II 1 1 1 1 1 1 1 HI II 1 1 1 1 m I !! I III i 1 1 1 III M 111 1 1 1 1 1 1 i I ! I II I II I ! 1 1 1 1 II 1 1 1 1 1 1 1 
AAGGACACATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGC 
80 90 100 110 120 130 140 

150 160 170 180 190 200 210 

CATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGT 

mini immi m mmiM mm iii imiiim mu mm iiiiiiiim mii 

CATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGT 
150 160 170 180 190 200 210 

220 230 240 250 260 270 280 

GCAGAACTCCTGTGATAACTGTCAGCCTGGTACTTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGCCC 

1 1 1 1 II 1 1 1 1 III II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1! 1 1 1 1 1 M I II ! i 1 1 1 m i II II II 1 1 1 1 1 1 1 1 1 1 1 1 II I 
GCAGAACTCCTGTGATAACTGTCAGCCTGGTACTTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGCCC 
220 230 240 250 260 270 280 

290 300 310 320 330 340 350 360 

TCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTATTTCAG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I N 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II (1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
TCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTATTTCAG 
290 300 310 320 330 340 350 360 

370 380 390 400 410 420 430 

GTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTCCATTGCTTGGGGCC 

ii mil mm mil mm mimimiim mm mu mm iiiiiiiimmi 

GTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTCCATTGCTTGGGGCC 
370 380 390 400 410 420 430 


440 450 460 470 480 490 500 

ACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAACCTGTAG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
ACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAACCTGTAG 
440 450 460 470 480 490 500 


510 520 530 540 550 560 570 

CTT GGGAACATTT AATGACCAGAACGGT ACTGGCGT CTGTCGACCCTGGACGAACTGCTCTCT AGACGGAAG 

! N II ! 1 11 m 1 1 ! 1 1 ! ( Ml I ] I ! i i li I ! i II i ! ! i ! 1 1 li m i 1 1 II m M 111 m Ml I II 1 1 1 

CTTGGGAACATTTAATGACCAGAACGGTACT GGCGTCTGTCGACCCTGGACGAACTGCTCTCTAGACGGAAG 
510 520 530 540 550 560 570 

580 590 600 610 620 630 640 

GTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAG 

mrnimn mmmiiimi inn mimiimimmmmiimmiimi 

GTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAG 
580 590 600 610 620 630 640 

650 660 670 680 690 - 700 710 720 

TACCACCATTTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTGGC 

minimi! mu mu mm mu iimimimiii mu mu mimiimi 

TACCACCATTTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTGGC 
650 660 670 680 690 700 710 720 

730 740 750 760 770 780 790 

GCTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAA 

1 1 ii 1 1 1 m i m i ii 1 1 m 1 1 m i i i m hi i ii i m 1 1 m 1 1 ni 1 1 1 1 m 1 1 ii m 1 1 1 1 1 m 

GCTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAA 
730 740 750 760 770 780 790 

800 810 820 830 840 850 860 

AAAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCACTGGAGCAGCTCAAGAGGAAGATGCTTGTAG 
1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1| 1 1 1 1 j 1 1 1 1 1 1 1 1 1| 1 1 f | 
AAAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCACTGGAGCAGCTCAAGAGGAAGATGCTTGTAG 
800 810 820 830 840 850 860 

870 880 890 900 910 920 930 

CTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAGATGT 
I 1 1 1 1 1 1 1 II i II 1 1 1 i 1 1 1 II 1 1 1 1 1 1 II i II 1 1 1 1 1 II li 1 1 1 1 III II II M I 1 1 1 1 1 1 1 III 1 1 1 1 II 
CTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAGATGT 
870 880 890 900 910 920 930 

940 950 960 970 980 990 1000 

GTGGGCCGAAACCGAGAAGCACTAGGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACCCTGTT 

i f 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 it 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 

GTGGGCCGAAACCGAGAAGCACTAGGACCC-CACCATCCTGTGGAACAGCACAAGCAACCCCACCACCCTGTT 
940 950 960 970 980 990 1000 

1010 1020 1030 1040 1050 1060 1070 1080 

CTTACACATCATCCTAGATGATGTGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACATATTTGTC 
I 1 1 1 1 1 1 1 II 1 1 I II 1 1 111 I 1 1 1 II II! 1 1 1 1 1 1 1 1 1 II 1 1 1 Ml I III I II I 1 1 1 IN 1 1 1 1 1 II 1 1 1 1 1 
CTTACACATCATCCTAGATGATGTGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACATATTTGTC 
1010 1020 1030 1040 1050 1060 1070 1080 

1090 1100 1110 1120 1130 1140 1150 

TTTACCTTTTTTAAATCTTTTTTTAAATTTAAATTTTATGTGTGTGAGTGTTTTGCCTGCCTGTATGCACAC 
11 1 1 1 1 1 1 III 1 1 1 1 1 I HI 1 1 1 1 1 1 II 1 1 1 III 1 1 1 II 1 1 III I 1 1 1 1 1 1 I II 1 1 1 1 1 II I II II 1 1 1 II I 
TTTACCTTTTTTAAATCTTTTTTTAAATTTAAATTTTATGTGTGTGAGTGTTTTGCCTGCCTGTATGCACAC 
1090 1100 1110 1120 1130 1140 1150 

1160 1170 1180 1190 1200 1210 1220 

GTGTGTGTGTGTGTGTGTGTGACACTCCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTTCCATAAGA 

Ml II I M Ml II ! 1 1 i I i I ! ill I M 1 1 i f M 1 1 ! ! i 1 1 1 1 I II ! I M IH I M I M I II II 1 1 

GTGTGTGTGTGTGTGTGTGTGACACTCCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTTCCATAAGA 
1160 1170 1180 1190 1200 1210 1220 

1230 1240 1250 1260 1270 1280 1290 

ACTGGAGTTATGGATGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAACGTGA 

I ! 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 Ml Ml I II 1 1 III 1 1 1 1 1 II I II 1 1 1 1 1 II 1 1 1 i I II 1 1 II 1 1 III 

ACTGGAGTTATGGATGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAACGTGA 
1230 1240 1250 1260 1270 1280 1290 



1300 1310 1320 1330 1340 1350 1360 

CTGT AT AATAAAAAAAAAATGATATTTCGGGAATTGT AGAGATTGT CCTGACACCCTTCT AGTTAATGATCT 

1 1 1 1 1 i 1 1 1 i m m 1 1 1 i i i f m 1 1 1 1 1 1 m 1 1 1 1 i i 1 1 1 1 1 1 1 m i in m i m 1 1 1 i 1 1 1 iii 

CTGTATAATAAAAAAAAAATGATATTTCGGGAATTGTAGAGATTCTCCTGACACCCTTCTAGTTAATGATCT 
1300 1310 1320 1330 1340 1350 1360 

1370 1380 1390 1400 1410 1420 1430 1440 

AAGAGGAATTGTTGATACGTAGTATACTGTATATGTGTATGTATATGTATATGTATATATAAGACTCTTTTA 

i i 1 1 1 1 II 1 1 1 E 1 1 i I III i I H 1 1 !! 1 1 1 i 1 1 1 1 1 1 ! 1 1 II 1 1 1 M! 1 1 1 1 1 i II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AAGAGGAATTGTTGATACGTAGTATACTGTATATGTGTATGTATATGTATATGTATATATAAGACTCTTTTA 
1370 1380 1390 1400 1410 1420 1430 1440 

1450 1460 1470 1480 1490 1500 1510 

CTGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGGACATTTTACGTCACACACACACAC 

I i I II 1 1 1 111 III 1 1 1 1 i 1 1 1 1 1 1 1 ! 1 1 1 1 II III III 1 1 1 1 1 II 1 1 1 1 !! S I II II III II 1 1 1 i 1 1 1 1 1 
CTGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGGACATTTTACGTCACACACACACAC 
1450 1460 1470 1480 1490 1500 1510 

1520 1530 1540 1550 1560 1570 1580 

ACACACACACACACACGTTTATACTACGTACTGTTATCGGTATTCTACGTCATATAATGGGATAGGGTAAAA 
1 1 1 1 1 1 II 1 1 ! I i 1 1 II 1 1 1 1 1 Ml ! 1 1 ! 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II II 1 1 1 II 1 1 1 1 1 1 1 
ACACACACACACACACGTTTATACTACGTACTGTTATCGGTATTCTACGTCATATAATGGGATAGGGTAAAA 
1520 1530 1540 1550 1560 1570 1580 

1590 1600 1610 1620 1630 1640 1650 

GGAAACCAAAGAGTGAGTGATATTATTGTGGAGGTGACAGACTACCCCTTCTGGGTACGTAGGGACAGACCT 

i ii 1 1 1 1 1 1 ii 1 1 1 1 m 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m i m ii ii ii 1 1 1 1 1 1 ii 1 1 1 1 1 ii ii i 

GGAAACCAAAGAGTGAGTGATATTATTGTGGAGGTGACAGACTACCCCTTCTGGGTACGTAGGGACAGACCT 
1590 1600 1610 1620 1630 1640 1650 

1660 1670 1680 1690 1700 1710 1720 

CCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGAAGAGGACAGAGGAGACACAG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
CCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGAAGAGGACAGAGGAGACACAG 
1660 1670 1680 1690 1700 1710 1720 

1730 1740 1750 1760 1770 1780 1790 1800 

TCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTGGACACTTGAGTG 

1 i I II I II Ml 111 II 1 1! 1 III I ! 1 1 1 1 1 1 III III 1 1 II 1 1 M 1 1 i 1 1 111 ! 1 1 II 1 1 ill 1 1 1 1 1 1 !! I 
TCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTGGACACTTGAGTG 
1730 1740 1750 1760 1770 1780 1790 1800 

1810 1820 1830 1840 1850 1860 1870 

TCATCCTTGCGCCGGAAGGTCAGGTGGTACCCGTCTGTAGGGGCGGGGAGACAGAGCCGCGGGGGAGCTACG 

I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1| I 
TCATCCTTGCGCCGGAAGGTCAG'GTGGTACCCGTCTGTAGGGGCGGGGAGACAGAGCCGCGGGGGAGCTACG 
1810 1820 1830 1840 1850 1860 1870 

1880 1890 1900 1910 1920 1930 1940 

AGAATCGACTCACAGGGCGCCCCGGGCTTCGCAAATGAAACTTTTTTAATCTCACAAGTTTCGTCCGGGCTC 
M II I II I m 1 1 1 1 ! M 1 1 m I IN ! 1 1 ! I ! I M I Ml i ill II I III I ill II 1 1 1 1 1 1 1 II 1 1 ! Ii! I 
AGAATCGACTCACAGGGCGCCCCGGGCTTCGCAAATGAAACTTTTTTAATCTCACAAGTTTCGTCCGGGCTC 
1880 1890 1900 1910 1920 1930 1940 

1950 1960 1970 1980 1990 2000 2010 

GGCGGACCTATGGCGTCGATCCTTATTACCTTATCCTGGCGCCAAGATAAAACAACCAAAAGCCTTGACTCC 

1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GGCGGACCT ATGGCGTCGAT CCTT ATT ACCTTATCCTGGCGCCAAGATAAAACAACCAAAAGCCTTGACTCC 
1950 1960 1970 1980 1990 2000 2010 

2020 2030 2040 2050 2060 2070 2080 

GGTACTAATTCTCCCTGCCGGCCCCCGTAAGCATAACGCGGCGATCTCCACTTTAAGAACCTGGCCGCGTTC 

1 1 II 1 II 1 1 1 1 1 1 1 1 1 II I HI I IN 1 1 1 1 1 1 III I IN 1 1 1 1 1 1 1! 1 1 1 1 1 1 1 1 1 II 1 1 II II I ! 1 1 1 1 1 1 
GGTACTAATTCTCCCTGCCGGCCCCCGTAAGCATAACGCGGCGATCTCCACTTTAAGAACCTGGCCGCGTTC 
2020 2030 2040 2050 2060 2070 2080 



2090 2100 2110 2120 2130 2140 2150 2160 

TGCCTGGTCTCGCTTTCGTAAACGGTTCTTACAAAAGTAATTAGTTCTTGCTTTCAGCCTCCAAGCTTCTGC 

i 1 1 ! ! i ! ! i 1 1 ! I ! I ! ISI 111 M II ! I HIM lllllt lillll llllll IIIIM MINI illlllll! 
TGCCTGGTCTCGCTTTCGTAAACGGTTCTTACAAAAGTAATTAGTTCTTGCTTTCAGCCTCCAAGCTTCTGC 
2090 2100 2110 2120 2130 2140 2150 2160 

2170 2180 2190 2200 2210 2220 2230 

TAGTCTATGGCAGCATCAAGGCTGGTATTTGCTACGGCTGACCGCTACGCCGCCGCAATAAGGGTACTGGGC 

iiiiimiiiiiiiiiiiiiiiimiiiiiMiimiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

TAGTCTATGGCAGCATCAAGGCTGGTATTTGCTACGGCTGACCGCTACGCCGCCGCAATAAGGGTACTGGGC 
2170 . 2180 2190 2200 2210 2220 2230 

2240 2250 2260 2270 2280 2290 2300 

GGCCCGTCGAAGGCCCTTTGGTTTCAGAAACCCAAGGCCCCCCTCATACCAACGTTTCGACTTTGATTCTTG 

iii mm mill iiiiii mm mm mm mimiiiii mm iiiiiiiiiiii iii 

GGCCCGTCGAAGGCCCTTTGGTTTCAGAAACCCAAGGCCCCCCTCATACCAACGTTTCGACTTTGATTCTTG 
2240 2250 2260 2270 2280 2290 2300 

2310 2320 2330 2340 X 

CCGGTACGTGGTGGTGGGTGCCTTAGCTCTTTCTCGATAGTTAGAC 

in iiiiii iiiiii iiiiii iiiiii iiiiii iiiiii iiiiii i 

CCGGTACGTGGTGGTGGGTGCCTTAGCTCTTTCTCGATAGTTAGAC 
2310 2320 2330 2340 2350 


2. ELLIS-012-FIG2AB.SEQ (1-2350) 

HUMILAX Hunan activation dependent T cell aRNAj conplete c 

LOCUS HUMILAX 1419 bp ss-nRNA PRI 30-APR-1993 

DEFINITION Hunan activation dependent T cell nRNA. conplete cds. 

ACCESSION LI 2964 

KEYWORDS cell surface receptor; nerve growth factor receptor; 

tunor necrosis factor receptor, 

SOURCE Hono sapiens cDNA to nRNA. 

ORGANISM Hono sapiens 

Eukaryota; Aninalia; Chordata; Vertebrata; Mannalia; Theria; 
Eutheria! Prinates; Haplorhini; Catarrhini; Honinidae. 

REFERENCE 1 (bases 1 to 1419) 

AUTHORS Schuarz»H.r Tuckuell. J.E. and LotziM. 

TITLE Nucleotide sequence of I LA r a cDNA encoding a new nenber of the 
hunan nerve growth factor/tunor necrosis factor receptor fanily 
JOURNAL Unpublished (1993) 

STANDARD full autonatic 
FEATURES Location/Qualifiers 

5'UTR 1..139 

3’UTR 908. .1419 

polyA_signal 1369. .1374 

polyA_site 1419 

source 1..1419 

/organisn=“Hono sapiens" 

/cell_type="tran5forned T lynphocyte /celMine SLB-1" 
/sequenced_nol=“cDNA to nRNA" 

CDS 140.. 907 

/gene=”ILA" 

/note=“ILA= induced by lynphocyte activation" 
/codon_start=l 

/translation=°MGNSCYNIVATLLLVLNFERTRSLQDPCSNCPAGTFCDNNRN9I 
CSPCPPNSFSSAGGGRTCDICRQCKGVFRTRKECSSTSNAECDCTPGFHCLGAGCSMC 
EQDCRQGQELTKKGCKDCCFGTFND8KRGICRPHTNCSLDGKSVLUNGTKERDVVCGP 
SPADLSPGASSVTPPAPAREPGHSPQI ISFFLALTSTALLFLLFFLTLRFSVVKRGRK 
KLLYIFKQPFMRPVSTTQEEDGCSCRFPEEEEGGCEL" 

BASE COUNT 373 a 340 c 342 g 364 t 
ORIGIN 

= 412 Optinized Score = 806 Significance = 25.75 

r Lt)V tla+rti&e r: OAA Mi cm rKse — ATT 


Initial Score 

Poc i T rlc-v-4 t 4 


Gaps 


161 Conservative Substitutions 


0 


X 10 20 30 40 50 60 

ATGTCCATGAACTGCTGAGTGGATAAACAGCACGGGATATCTCTG — TCT — AAAGGAATATTACT- 

illl I II llllll III I I II I I III III I II I III I 
CCACGCGTCCGAG-ACCAAGGAGTGG— AAAGTTCTCCGG-CAGCCCTGAGATCTCAAGAGTGACATTTGTG 
X 10 20 30 40 50 60 

70 80 90 100 110 120 130 

ACACCAG-GAAAAGGA — CACATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGT-GC 

i mu ii ii i mi i i it ii in ii i ii i mu ii 

AGACCAGCTAATTTGATTAAAATTC TCTTGGAATCAG-CTTTGCTAG--TATCATA CCTGTCGC 

70 80 90 100 110 120 

140 150 160 170 180 190 200 

ATGTGACATTTCGCCATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAG 

i ii mi minimi imiimi i n i n 1 n mu i i n mi 

A — GA--TTTCATCATGGGAAACAGCTGTTACAACATAGTAGCCACTCTGTTGCTGGTCCTCAACTTTGAG 
130 140 150 160 170 180 190 

210 220 230 240 250 260 270 

AAGGTGGGAGCCGTGCAGAACTCCTGTGATAACTGTCAGCCTGGTACTTTCTGCAGAAAAT-ACA— ATCCA 

ii ii i mu i i m mm i limn him ii hi hi i ii 

AGGACAAGATCATTGCAGGATCCTTGTAGTAACTGCCCAGCTGGTACATTCTG-TGATAATAACAGGAATCA 
200 210 220 230 240 250 260 


280 290 300 310 320 330 340 

G-TCTGCAAGAGCTGCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGT 

i i 1 1 ii hi Hum ii uimiii imimi u mi mi mu 

GATTTGCAGTCCCTGTCCTCCAAATAGTTTCTCCAGCGCAGGTGGACAAAGGACCTGTGACATATGCAGGCA 
270 280 290 300 310 320 330 


350 360 370 380 390 400 410 

GTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGG 

mi m mini ii mi ii mu hi hi n hiiiiii mi i hi 

GTGTAAAGGTGTTTTCAGGACCAGGAAGGAGTGTTCCTCCACCAGCAATGCAGAGTGTGACTGCACTCCAGG 
340 350 360 370 380 390 400 410 


420 430 440 450 460 470 480 

ATTCCATTGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCA 

ii ii in iiiii n mi ii mm mi n n i u n n n u n 1 

GTTTCACTGCCTGGGGGCAGGATGCAGCATGTGTGAACAGGATTGTAGACAAGGTCAAGAACTGACAAAAAA 
420 430 440 450 460 470 480 


490 500 510 520 530 540 550 

GGGTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTGGACGAA 

urn in mi mi u uiiiiii ii iiiii i mi 1111111111111111 11 

AGGTTGTAAAGACTGTTGCTTTGGGACATTTAACGATCAGAAACG — TGGCATCTGTCGACCCTGGACAAA 
490 500 510 520 530 540 550 

560 570 580 590 600 610 620 630 

CTGCTCTCTAGACGGAAGGTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGT 

in hi i ii iiii 1 1 1 ( u 1 1 1 1 ii mu i iiiii 1 1 1 1 1 1 1 1 1 iiiiiih u 

CTGTTCTTTGGATGGAAAGTCTGTGCTTGTGAATGGGACGAAGGAGAGGGACGTGGTCTGTGGACCATCTCC 
560 570 580 590 600 610 620 

640 650 660 670 680 690 

GGTGAGCTTCTCTCCCAGTACCACCATTTCTGTGA CTCCAGAGGGAGGACCAGGAGGGCACTC 

i i iiiii hi mi i in 1 1 1 1 i ii i i hi iiiiii iiiii 

AGCCGACCTCTCT-CCGGGAGCATC— CTCTGTGACCCCGCCTGCCCCTGCGAGAGAGCCAGGA — CACTC 
630 640 650 660 670 680 690 

700 710 720 730 740 750 760 

CTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCG — GCTTTGCTGCTGGCCCTGATCTTCATTACTCT 

HU II I 1111 II II HIIIIII III II IIIII I III IIIII I II II 

TrrKcitfiAjrhTnrmmyrrxf'rci'Ti'.iif'f'.Tf'r'tnnf'r'rTnt'Tmrnc.r'TrinrrTrrTr&rm 



700 710 720 730 740 750 760 

770 780 790 800 810 820 830 

CCTGTTCT CTGT GCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGAAGACCAC 

ii mum i m n m u m m i mum imiiiiii i mm 

CCGTTTCTCTGTTGTTAAACGGGGCAGAAAGAAACTCCTGTATATATTCAAACAACCATTTATG-AGACCAG 
770 780 790 800 810 820 830 

840 850 860 870 880 890 900 

T-GGAGCAGCTCAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAGGCT 

i i i i i 1 1 1 1 1 1 1 1 m ii miiiiiiim mi i miiiiiimm i i i n 

TACAAACTACTCAAGAGGAAGATGGCTGTAGCTGCCGATTTCCAGAAGAAGAAGAAGGAGGATGTGAA— CT 
840 850 860 870 880 890 900 

910 920 930 940 950 960 970 

ATGAGCTGTGATGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAGAAGCA-CTAGGA — CCCCACCATC 

III II 11 III I 1111 1 III III! I II III I I iiii mi 

GTGAAATG-GAAGT-CAA — TAGG-GCTGT-TGGGACTTTCTTGAAAAGAAGCAAGGAAATATGAGTCATC 
910 920 930 940 950 960 

980 990 1000 1010 1020 1030 1040 

CTGTGGAACAG CACAAGCAACCCCACCACCCTGTTCTTACACATCAT-CCTAGATGATGTGTGGGCGC 

i i tin n mm inn in nil n n n in i i 

CGCTATCACAGCTTTCAAAAGCAAGAACACCATCCT ACATAATACCCAG— GAT TCCC 

970 980 990 1000 1010 1020 

1050 1060 1070 1080 1090 1100 

GCACCTCATCCAAGTCTCTTCT-AACGCTAACATATTTGTCTTT ACCTTTTTTAAATCTTTTTT 

n i n in nil n n n n i iiii in mi i mm 

CCAACACA — CGTTCTTTTCTAAATGCCAATGAGTTGGCCTTTAAAAATGCACCACTTTTTTTTTTTTTTT 
1030 1040 1050 1060 1070 1080 1090 

1110 1120 1130 1140 1150 

TAA ATTTAAATTT TATG-TG-TGTG-AGT-GTTTTGCC-TGCCTGTATGCACACGTG-TGTG 

i i i i ii i i n in in i n n n i nil i n i 

GGACAGGGTCTCACTCTGTCACCCAGGCTGGAGTGCAGTGGCACCACCATGGCTCTCTGCAGCCTTGACCTC 


1100 1110 1120 

1130 

1140 

1150 

1160 

1160 1170 1180 

1190 

1200 

1210 

1220 


TGTGTGTGTGTGTGACACTCCT GATGCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTTCCATAAGAA 

n i i iiii inn i mm n i in iiii ii hi i i 

TGGGAGCTCAAGTGATCCTCCTGCCTCAGTCTCCTGAGTAGCT-GGAACTACAAGGAAGGG-- CCACCACAC 
1170 1180 1190 1200 1210 1220 1230 

1230 1240 1250 1260 1270 1280 1290 

CTGGAGTTA-TGGATGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTT CTTATTTTAA 

II II I I I II I I I II I II I II 1 1 1 III I II I 1 II 

CT-GACTAACTTTTTTGTTTTTTGTTGG — TAAAGAT-GGCATTTCGCCATGTTGTACAGGCTGGTCTCAA 
1240 1250 1260 1270 1280 1290 1300 

1300 1310 1320 1330 1340 1350 

CGTGACT-GTATAATAAAAAAAAAAT— GATATTTCGGGAATTGTAGAGATTGTCCTGACA— CCCTTCTAG 

III II II I I II III III III I II III II III I I 

ACTCCTAGGTTCACTTTGGCCTCCCAAAGTGC-TGGGATTACAGACA-TGAACTGCCAGGCCCGGCCA- 

1310 1320 1330 1340 1350 1360 

1360 1370 1380 1390 1400 1410 1420 1430 

TTAATGATCTAAGAGGAATTGTTGATACGTAGTATACTGTATATGTGTATGTATATGTATATGTATATATAA 

III II I III II I I I I I II I III I I I II II 

-AAATAAT — GCACCACTT-TTAACA-GAA-CAGAC — AGATGAGGACAGAGCTGGTGAT 
1370 1380 1390 1400 1410 X 

1440 1450 1460 1470 

GACTCTTTTACTGTCAAAGTCAACCTAGAGTGTCTGGTTA 



3. ELL I S-0 1 2-F IG2AB . SEQ (1-2350) 

CBRR5A Caenorhabditis briggsae DNA for 5S ribosonal RNA ( 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

FEATURES 


CBRR5A 944 bp DNA INV 18-MAR-1991 

Caenorhabditis briggsae DNA for 5S ribosonal RNA (lkb) 

X16225 

5S ribosonal RNA; leader RNA; ribosonal RNA. 
nenatode 

Caenorhabditis briggsae 

Eukaryota; Animal i s ? Eunetazoa; Nenatoda; Secernenteai Rhabditia 
Rhabditida: Rhabditina; Rhabditoidea; Rhabditidae. 

1 (bases 1 to 944) 

HondaiB.il. 

Direct Subnission 

Subnitted (23-AUG-1989) Honda B.M.> Sinon Fraser University. 
Biology Departnent. Burnaby B.C.. Canada V5A 156. 
full autonatic 

2 (bases 1 to 944) 

Nelson. D.W. and Honda. B.M. 

Tuo highly conserved transcribed in the 5S DNA repeats of the 
Nenatodes Caenorhabditis elegans and Caenorhabditis briggs. 
Nucleic Acids Res. 17. 8657-8667 (1989) 
full autonatic 

Location/Qualifiers 


nisc_feature conplenent(513. .607) 

/note=“spliced leader RNA sequence 0 
r-RNA 825.. 943 

/note =, '5S ribosonal RNA sequence” 
BASE COUNT 269 a 172 c 183 g 320 t 
ORIGIN 


Initial Score 
Residue Identity 
Gaps 


163 Optinized Score = 406 Significance = 8.70 
487. Hatches = 486 Hi snatches = 409 
117 Conservative Substitutions = 0 


1180 1190 1200 1210 1220 1230 1240 

CCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTTCCATAAGAACTGGAGTTATGGATGGCTGTG-AGC 


AAGCTTTTGCTTTTTTTGTATT 
X 10 20 

1250 1260 1270 1280 1290 1300 1310 1320 

CGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAACGTGACTGTATAATAAAAAAAAAATGATAT 

i iii i i I ii i iii min i m i mini n i 

CAATTTCATAAATATGAATTCAAA — GATTTCAATTTTTAAGATTCATGT-TCTTAAAAAATTGCAGAATT 


30 

40 

50 

60 

70 

80 

1330 

1340 

1350 

1360 

1370 

1380 


TTCGGGAATTGTAGAGATTGTCCTGACACCCTTC — TAGTTAATGATCT — AAGAGGAATTGTTGATACGT 

m i i ii mi ii ii iii ii iii i ii i ii mi i i i n 

TTCATCATTGGTTTGGATTTTCGTGCTTTTTTTCTAATATTTATTTATTTTGTAAATAAATTTGTAAAATGT 
100 110 120 130 140 150 160 

1390 1400 1410 1420 1430 1440 1450 

AGTATACT-GTATATGTGTATGTATATGTATATGTATATATAA— GACTCTTTTACTGTCAAAGTCAACCTA 

i i n ini ii ii i in i inn n i ii ii nil tin i 

— TTTCCTCACAAATGAG-AAGTGT-TGTCGAAAAATATAAAATTGTGTCAAATTCAATCAATTTCAATATG 
170 180 190 200 210 220 230 

1460 1470 1480 1490 1500 1510 1520 

GAGTGTCTGGTTACCAGGTCAA-TTTTATTGGACATTTTACGTCACACACACA CACACACACACACAC 

i i n i i mi i in i i nil i n in nil i i i i i 

AAAAATTTGAAT GTTCAAGATGCATTCG-CGTTTT-CTTCCCAC-CACATTGCCTGAGTTTCTGAAAT 

240 250 260 270 280 290 


ACA-CGTTT ATACTACGT ACT GTTATCGGT ATT CTACGTCAT AT AATGGGAT AGGGTAAAAGGAAACCAA AG 

i i ii ii i! mi i i i i imii in in iiiiii i n i 

AAATTGTGCATGAATCG-ACTGAAAATAGATGTGTGTGT-ATAAAAT-TTATATTGTAAAA--ATTTCAGAA 


300 

310 

320 

330 

340 

350 

360 

1600 

1610 

1620 

1630 

1640 

1650 



AGTGAGTGATATTATTGT-GGAGGTGAC-AGACTACCCCTTCTG— GGTACG-TAG— GGACAGACCTCCT 


TCTAAG-CCTAAAATTTTAAGAATTCACTAGAATTTAAATGATGACAATTCCGATCGTTTTACAACCCTTTT 
370 380 390 400 410 420 430 

1660 1670 1680 1690 1700 1710 1720 1730 

TCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGAAGAGGACAGAGGAGACACAGTCC 

II II I II II II 1 11 III 1 I I I III II II I 

ACGATGCATCGATTTTTCGATTTTCAATGGACCATAA-CTCCGGCATG-CGTTGAC — CGATTTTCAATTT 
440 450 460 470 480 490 500 

1740 1750 1760 1770 1780 1790 

GAAAAGTTATTTT-TC-CGGCAA ATCCTTTCCCTGTTTCGTGAC — ACTCCACCCCT-TGTGGACACT 

I II 1 1 1 1 1 1 1 1 II I I I II Mil II I I II III I I III 

TTAAAGTTATTTTGTCTCCCCGAGAGGAGACGTTCCAAAATTTATAGCTAACGCCAAATTTCTTTGG G 

510 520 530 540 550 560 

1800 1810 1820 1830 1840 1850 1860 

TGAGTGTCATCCTTGCGCCGGAAGGTCAGGTGGT-ACCCGTCTGTAGGGGCGGGGAGACAGAGCCGCGGGG 

I III III II II Ii I III I I II II I III III I I I I II 
TCAGTTTCAATGTT-TACCTCAAACTTGGGTAATTAAACCAACTACATCGGC-GGGCTTCGCACACTAGTGG 
570 580 590 600 610 620 630 

1870 1880 1890 1900 1910 1920 1930 

GAGCTACGAG-AATCGACTCACAGGGCGCCCCGGGCTTCGCAAATGAAACTTTTTTAATCTCACAAGTTTCG 

i inn mi i i i i nil i i mi i i in i n n 

AA--AACGAGCGGCCGACACCAATCGAGGCCCGCG-TCGGCCACCG — CATTTCGA CA GCG 


640 

650 

660 

670 

680 

690 

1940 

1950 

1960 

1970 

1980 

1990 2000 


TCCGGGCTCGGCGGACCTATGGCGTCGATCCTTATTACCTTATCCTGGCG — CCAAGATAAAACAACCAAA 

I II II II II ill II II III II II I III I I II II I I 

TGCGCGC — GCACACTGCTGG-GT-GAGTCTTCTT--CTACTGTTGGGGGGACTGGGAGAATTCGCTCTTC 
700 710 720 730 740 750 

2010 2020 2030 2040 2050 2060 2070 

AGCCTT — GACTCCGGT-ACTAATTCTCCCT-GCCG-GCCCCCGTAAGCATAACGCGG-CGATCTCCACTTT 

n n n i i in i mu in i n i n in i n 

TGCGTTTCGATTTATTTCACTGA — TCCCTAGTAGAGTTAAAAGGGGAATGTAGAGGTAGATGTGATGCTT 
760 770 780 790 800 810 820 

2080 2090 2100 2110 2120 2130 

AAGAACCTGGCCGCGTTCTGCCTGGTCTCGCTTTCGTAAACGGTTCTTACAA — AAGTAATTAGTT-CTTG 

I Mil I III II 11 1 I I III II III III III II III I 

ACG-ACCATATCACGT— TGAATGCACGCCATCCCGT— CCGATCTGGCAAGTTAAGCAA— CGTTGAGTC 
830 840 850 860 870 880 890 

2140 2150 2160 2170 2180 2190 2200 

CTTTCAGCCTCCAAGCTTCTGCTA-GTCTATGGCAGCATCAAGGCTGGTATTTGCTACGGCTGACCGCTACG 

I I II I I I II I I I I II I I I I I II I I 

CAGTTAG-TACTTGGATCGGAGACGGCCTGGGAATCCTGGATGTTGTAAGCTT 



900 

910 

920 

930 

940 X 

2210 

2220 

2230 

2240 




CCGCCGCAATAAGGGTACTGGGCGGCCCGTCG 


4. ELL I S-0 1 2-F I G2AB . SEG (1-2350) 

CLS88D0 Hanster EcoRI donor DNA fragment for S88 aprt inse 



LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

COMMENT 


CLS88D0 3906 bp DNA ROD 20-MAY-1992 

Hamster EcoRI donor DNA fragment for S83 aprt insertion 
X14996 X13999 X14000 

Alu repetitive sequence; insertion sequence; repetitive sequence. 
Chinese hamster 
Cricetulus longicaudatus 

Eukar-yota; Animal ia; Metazoa; Chordata; Vertebrata; Mammalia; 
Theria! Eutheria; Rodentia; Myomorpha; CricetidaeJ Cricetinae; 
Cricetini . 

1 (bases 361 to 690; 1481 to 2640) 

Nalbantoglu. J. , Miles. C. and Meuth.M. 

Insertion of Unique and Repetitive DNA Fragments into the aprt 

Locus of Hamster Cells 

J. Mol. Biol. 200. 449-459 (1988) 

full automatic 

2 (bases 1 to 3906) 

Meuth.M. 

Direct Submission 

Submitted < 1 3-FEB- 1989 ) to the EMBL Data Library, 
full automatic 

#source; clone_library=lambda NM1149.; 


See x07513 for S88 mutant seq. 
FEATURES Location/Qualifiers 

misc_feature 572.. 624 

/note="CH0 Alu repeat" 
conflict 1625 


/citation=I 1 ] 

/note=“c is g in 113" 
misc_feature 1734.. 2018 

/note=°S88 donor DNA" 

BASE COUNT 1048 a 945 c 905 g 1008 t 
ORIGIN 


Initial 

Score = 

162 

Optimized Score = 1019 

Significance = 

Residue 

Identity = 

497. 

Matches = 1238 

Mismatches = 

Gaps 

= 325 

Conservative Substitutions 

= 


8.63 

954 

0 


X 10 

ATGTCCATGA- — ACTGC-T 

1 III!! I II I I 

TCAGGGTTCCACCATATCCTGAAGAGCAGATGAGACACCAGGCTTTTTACCTTTCCATCAGGAGACAACCAT 
940 950 960 970 980 X 990 1000 

20 30 40 50 60 70 80 

GAGTGGA-TAAACAGCACGGGATATCTCTGTCTAAAGGAAT-ATTACTACACCAGGAAAAGGACA-CATTCG 

II Mil II III II I I llll I I II I I I I I II II III III 

GATTGGACTAGCCAG-ACCGCA — GCCTGTGAGCCGTTCTAATCAATCCCCTACAAATAGAACATTTTTCT 


1010 

1020 

1030 

1040 

1050 

1060 

1070 

90 

100 

110 

120 

130 

140 

150 


AC-AACAGGAAAGGAGC-CTGT CACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGCCATGGGA 

I llll I I II I llll I III I III HIM II II llll I 

TCCATCAGTTCTGTTCCGTTGGAGAACCCAGACTAATACAATCTCC CATGTTCCAGTT — TCATGTGT 

1080 1090 1100 1110 1120 1130 

160 170 180 190 200 210 

AACAAC — TGTTACAACGTGGT — GGTCATTGTGCTGCTGCTAGTGGGCTGTGAG-AAGGTGGGAGCCGTGC 

ii ii hi in ii hi i ii hi i ii iiiiii ii i mi i i i 

CACTACCATGTGACAGGCCAGTAGGGTGAGTGGGCT-CAGC — TGGGCTCTGCGCCAGGTCGCGATCAAAC 
1140 1150 1160 1170 1180 1190 1200 

220 230 240 250 260 270 280 

AGA — ACTC — CTGTGATAA-CTGTCAGCCTGGTACTTTCTGCAGAA — AATACAATCC-AGTCTGCA-AG 

III III II I II II III llll I I II llll I I llll I I llll II 

4r:ArrAr:TrTf:TTrrf;rAA4rrTrTrATrrTrflrrTTf'Tr:Tn-AriArrrArrTrTATrrrAATATrrATAr 



1210 1220 1230 1240 1250 1260 1270 


290 300 310 320 330 340 350 

AGCTGCCCT CCAAGT ACCTTCTCCAGCAT AGGT GGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGC 

I III I I I II I I III I I I III I II II III II II I 
A CCCCACTTGCAGC-TGTAGGACTGGGGTCCCCTTTCCAGCTG-GATGTCAGC-GAGGGT — CA-CC 


1280 

1290 

1300 

1310 

1320 

1330 

360 

370 

380 

390 

400 

410 


TATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAA GGA 

I I II I III I I III! I I III II II II II II II 

T-TAGCAACTGACAGACTTCTCATTCCTTCATCTTCAAGGC--AGCAAGATCACCTCAAATCCTCCTCCTGA 
1340 1350 1360 1370 1380 1390 1400 

420 430 440 450 460 470 480 

TTCCA — TTGCTTGG-GGCCAC-AGTGCA-CCAG-ATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACG 

ii i ii iii ii i im mi in i ii ii i i n i mi i n 

GTCAAACCTCCCTGGCTTCCTCTTCTGCACCCAGCCAGTG-ACAGCTCT-CTGCTCT-GACAGGTTCAGACC 
1410 1420 1430 1440 1450 1460 1470 

490 500 510 520 530 540 550 

AAGCAGGGTTG-CAAAAC-CTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTG-GCGTCTGTCGACCC 

I llll III 1 I II II 1 I II I II II I II 1 II 1 I I I 11 1 
CA-CAGGATGGTCTGTTCTCATTACCT— CCACCCTTCACGGATCACACAGGAAGTGAGGGAGGGACAACTC 
1480 1490 1500 1510 1520 1530 1540 


560 570 580 590 600 610 620 

TGGACGAACTGCTCTCTAGACGGA-AGGTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGG 

n i i i i i in in mi n i i i n in n n i n i 

CAGA-TTCCCGTCCAGCACAGGGACAGGGAGGTGCACCAG-CTG — AAGACAGAGTAG — AGTTGAATGCTG 
1550 1560 1570 1580 1590 1600 1610 

630 640 650 660 670 680 690 

ACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTC 

III I II I II I II III I I III I I I II I I III 
ACCAAGTCAGGACACTTTGAGGAGATGGCTGTTCAGAACTGAG-CCCCAAATTGCG — CCTGTGAAGAACTG 
1620 1630 1640 1650 1660 1670 

700 710 720 730 740 750 760 

CTTGCAGGTCCTTACCT — TGTTCCTGGC — GCTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACT 

II III I I III I III II I II I I I II I I III III I I I 

ATT-CAGATAC-TACATGAGTGTGATTGACAAGCAGGAAACAGCATGG— GCT AAGATGTAAAGCA— 

1680 1690 1700 1710 1720 1730 1740 

770 780 790 800 810 820 830 

CTCCTGTTCTCTGTGCTCAAATGGATCAGGAAAAAATTC-CCCCACATAT-TCAAGCAACCATTTAAGAAGA 

I III II II II I II I II I III III III I III II I 

— GAGAAATCT-TG-ACACATACA-CA— CACTCATACAGCCCTGTTATCACAATAAGACATGTA— TTGG 
1750 1760 1770 1780 1790 1800 

840 850 860 870 880 890 900 

CCACTGGAGCAGCT-CAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGA 

mi n i i i i i n n i n i n ii i n nil 

TCACTCCCAAAGGTCCTTGTGCCCGCTG-AGGTCCCCTCC-CTCACC-CTGTCTGCTCCCCCAGACCCAGGA 
1810 1820 1330 1840 1850 1860 1870 

910 920 930 940 950 960 

GGCTATGAGCTGTGATGTACTA-TCCTAG-GAGATGTGTGGGCCGAAACCGAGAAG-CACTAGGACCCCA-- 

i n i mi i i in nnn n n i i n i i inn in i i n 

AACCATAA — TGTGCT-TTCTATTCCTAGAGATATATTTTGG — GCTTTCTAGAAGTTTCTACAAACTCAGA 
1880 1890 1900 1910 1920 1930 

970 980 990 1000 1010 1020 

-CCA-TCCTGTGGAACAGCACAAGC AACCCCACCACCCTGTTCTTACACATCATCCTAGATGA- 

n i nnn i n i i n nil inn i n i n i i 

C4rArTf;TTr;TCf;fiTT4ArAA4fVTrrTrTr:TTAA4r‘rri'’A4r:rf'r/’Ti'f'f'4Tf'TrAf'A a^tt a at 


1940 

1950 

1960 

1970 

1980 

1990 

2000 

1030 

1040 

1050 

1060 

1070 

1080 

1090 


TGTGTGGGCGCGCACCTCATC-CAAGTCTCTTCTAACG-- CTAACATATTTGTCTTTACCTTTTTTAAATCT 

III I I I I I I III Mil I III I III I II I I I II I III 
TGTATTTG-GAGTA— TTATCTGAAGT-TAAAATGAAGTTGTTACAGTGAGGCCTGAATCCGTGTTTATTCT 
2010 2020 2030 2040 2050 2060 2070 


1100 1110 1120 1130 1140 1150 1160 

TTTT — TTAAATTTAAATTTTATGTGTGTGAGTGT-TTTGCCTGCCTGTATGCACACGTGTGTGTGTGTGT 

iii ii i iii i mi iiiini mi i n n in n nnnnnnn 

ATTTATATTTACCTTATAATTTACGTGTGTGTGTGTGTGTGTGTGTGTGTGTG TGTGTGTGTGTGTGT 

2080 2090 2100 2110 2120 2130 2140 

1170 1180 1190 1200 1210 1220 

GTGTGTG-ACACTCCTG — ATGC-CTGAGGA — GGTCAGAAG — AGAAAGGG — TTGGTTCCATAAGAAC 

in n i n n in n i i mi i in mi i i n in i 

GTGCCTGCATACATGTGTGTGTGCTATGTGCATACCCACAGAGGCCAGAGAGGGCATCAGACCCTGAAG— C 
2150 2160 2170 2180 2190 2200 2210 


1230 1240 1250 1260 1270 1280 1290 

TGGAGTTATGGATGGCTGTGAGCCGGNNNGATAGGTC-GGGACGGAGACCTGTCTTCTTATTTTAACGTG-- 

inii n niiiinin ii i i n tin i i i i in i mm i 

TGGAGCTAGAGATGGCTGTGAACTGCCATGTGGGTTCTGGGAACCA-AACAGGGTTC — TCTGAAAGAGCA 
2220 2230 2240 2250 2260 2270 

1300 1310 1320 1330 1340 1350 1360 

ACT-GTAT-AATAAAAAAAAAATGATATTTCGGGAATTGTAGAGATTGTCCTGACA-CCCTTCTAGTTAATG 

III II I III 1 II III 1 1111 III I I I I II I II 1 

ACTGGTGTTCTTAATTCCTGAGCCATCCTTC-AGCCCCCTAGA-CCTGTTTTTATATGGCCTCAGGGTATAG 
2280 2290 2300 2310 2320 2330 2340 

1370 1380 1390 1400 1410 1420 

ATCTAAGAGGAATTGTT — GATACGTAGTATACTGTATAT- — GTGTATGTA-TATGTATATGTATATAT 

i i in inn mi n i in i i i i i nil i n i n 

ACCAAAGCCCTTTTGTTTTGCATACATATACAAATGTTCCTTGCCCTCAAGGAATTATGCTTCCTTAAAAAT 


2350 

2360 

2370 

2380 

2390 

2400 

2410 2420 

1430 

1440 

1450 

1460 

1470 

1480 

1490 


AAGACTCTTTTACTGTCAAAGTCA — ACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGGACATTTTAC 


AAGTC-AAAATAC-ATCAAATACACAGACATACACAGCTAACAAACCAAG-CAGCCGAGCTTGGCACTCCAG 
2430 2440 2450 2460 2470 2480 

1500 1510 1520 1530 1540 1550 1560 

GTCACACACACACACACACACACACACACACGTTTATACTACGTACTGTT-ATCGGTATTCTACGTCATATA 

ill I 1 II I I Mil I 1 III II III II 1 I I I III I 

AGCACTGTATGATGC-CA-GGAGATTCTCACTTGTGAACT — TAGGGTTCAT— GAAGCCCCAGCCATGGA 
2490 2500 2510 2520 2530 2540 2550 

1570 1580 1590 1600 1610 1620 1630 

ATGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATATTATTGTGGAGG — TGACAGACTACCCCTTCTGG 

i i mi in i i in in n i i n n in n i i n nil 

GGGAAAGAGGGAGGGAGGGAGGGAGGGAGGGAGGGA-GGGAGGGAGGGGGGATGAAGGAAAAACTCTCCTGG 
2560 2570 2530 2590 2600 2610 2620 

1640 1650 1660 1670 1680 1690 1700 

GTACGTAGGGACAGACCTCCTTCGGACTGTCTAAAACTCCCCTTAGAAGTC — TCGTCAAGTTCCCGGAC — 

I I II II I II I I Ml I II II Ml II I II I II II 

— AATTTTTGA-TTAC — CATT-GCATAGTCTTAGACCAATCTGAGAA-TCAATACTCTAATTTGTAAACAT 
2630 2640 2650 2660 2670 2680 2690 

1710 1720 1730 1740 1750 1760 

— GAA — GAGGACAGAGGAGACACAGTCCGAAAAGTTATTTT TCCGGCAAATCCTT — TCCCTGTTT 

II 111 I I II II till I I III III I I II Mill I 

TTTAArTTAATTATAATTTATATArTrT A A A A ATT ACTTT in' ATi’AA TrAP A AOrTTr- A A rr PT^TAT 


2700 

2710 

2720 

2730 

2740 

2750 

1770 1780 

1790 

1800 

1810 

1820 

1830 


CGTGACACTCCACCCCTTGTGGACACTTGAGTGTCATCCTTGCGCCGGAAGGTCAGGTGGT — ACCCGT-C 

i mi ii i ii mi ii n i i i ini i i i n 

C-AAACAATACAAAAACTTAAAAAAAAAGAGT-TAA — AAAGCTATGTATGCTAAGATACTCAAATCAGTAT 
2760 2770 2780 2790 2800 2810 2820 

1840 1850 1860 1870 1880 1890 1900 

TGTAGGGGCGGGGAGACAGAGC— CGCGGGGGAGCTACGAGAATCGACTCACAGGGCGCCCCGGGCTTCGCA 

1111 II I Ii I I I I I I I II III I II III I III I I 

TGTATGCGTAAGTTGATAAACCAAAAAACAAAAACCATAACAAAGGACAGAAAGATGGCCTC-AGCT-GGTA 
2830 2840 2850 2860 2870 2880 2890 

1910 1920 1930 1940 1950 1960 

AA TGAAACTTTTTTAATCTCACAAGTTTCGT--CCGG-GCTCGGCGGACCTATGGCGTCGATCCTT 

II II II II I II 1111 I I llll I II I II I III 
AAGGTGTGTGTCACGGCCTTGA— TCGTGAGTTCCATCCCCGGAACCTATCTGGTGGAAGGAGAGGAT— GG 
2900 2910 2920 2930 2940 2950 2960 

1970 1980 1990 2000 2010 2020 2030 

ATTACCTTATCCTGGCGCCAAGATAAAACAACCAAAAGC — CTTG-ACTCCGGTACTAATTCTCCCTGCCGG 

I I III III I II I I III II I II I II II I I II II I I I I 

ACTCTCTTGCTCTGTCCCCTA-CTTTCACAGGCACAGGCTTCCTGCACACAG— ACGCATAC-ACATAAATG 


2970 

2980 

2990 

3000 

3010 

3020 

3030 

2040 

2050 

2060 

2070 

2080 

2090 

2100 


CCCCCGTAA — GCATAACGCGGCGA-TCTCCACTTTAAGAACCTGGCCGCGTTCTGCCTGGTC-TCGCTTTC 

i n i i i i i i i n ii in mi n inn i in i n i 

TAACTTAAATTGTGTTAAGTTTCCACTGTCAAGTGTAACAACCCAGCGAGTTTCTGTATAGTCATAGCCT — 


3040 

3050 

3060 

3070 

3080 

3090 

3100 

2110 

2120 

2130 

2140 

2150 

2160 

2170 


GTAAACGGTTC — TTACAAAAGTAATTAGTT CTTGCTTTCAGCCTCCAAGCTTCTGCTAGTCTATGGCAGCA 

in i i mi n i n i n i n i in i ii i n in n 

GTAGCCATCACCATTACCAAGTGCAGAAGATTTTTATCACA-CAAGAAAGGAGC-CCCATGCCATCGCATCA 
3110 3120 3130 3140 3150 3160 3170 

2180 2190 2200 2210 2220 2230 2240 

TCAAG-GCTGGTATTTGCTACGGCTGACCGCTACGCCGCCGCAATAAGGGTACTGGGCGGCCCGTCGAAGGC 

i i i i i mi i in in i i mi in in n n i i 

CCCCGACCCTGCAGCTGCT-CATCTG-CCG-TCCATCTCTG-AAT T — TG CCTTCTACAGA 

31B0 3190 3200 3210 3220 

2250 2260 2270 2280 2290 2300 2310 

CCTTTGGTTTCAGAAACCCAAGGCCCCCCTCATACCAACGTTT-CGACTTTGATTCTTGCCGGTACGTGGT- 

I II I I II I III I III III I III II II I II II I ii II 

CTCTTCATAT-AAAGAGTCAA — TCAACCT CCAGCCTTTGCGTCTGGCGTATTTCCCTGAGCGCAGTA 

3230 3240 3250 3260 3270 3280 3290 

2320 2330 2340 X 

GGTGGGTG CCT — TAGCTCTTTCTCGATAGTTA GAC 

in i i in mi i i i in i in 

GGTCAGAGCAGACATCCTGACAGCTTTGAGACTACAGTGACAGTGACATTTCACATGCAGACAGACAAACCA 
3300 3310 3320 3330 3340 3350 3360 

GGGTGGGGCCGTCCTGCAGAAGCGG 
3370 3380 3390 


5. ELLIS-012-F1G2AB.SEQ (1-2350) 

XELAEIP X.laevis aaidating enzyne (AE-I) nRNAi conplete cd 

LOCUS XELAEIP 2733 bp ss-nRNA VRT 15-NAR-1989 

DEFINITION X.laevis anidating enzyne (AE-I) nRNAi conplete cds. 

AfTcqqi™ Minna 


KEYWORDS anidating enzyne. 

SOURCE X, laevis skin, cDNA to nRNA, clone pXAE457. 

ORGANISM Xenopus laevis 

Eukaryota! Aninalia,’ Chordata; Vertebrata,' Anphibia; Lissanphibia; 
Anura; Archeobatrachia; Pipoidea; Pipidae,' Xenopodinae. 

REFERENCE 1 (bases I to 2733) 

AUTHORS MizuriOrK., Ohsuye,K., Wada,Y., Fuchinura»K. » Tanaka,S. and 
Matsuo, H. 

TITLE Cloning and sequence of cDNA encoding a peptide C-terninal 
alpha-anidating enzyne fron Xenopus laevis 
JOURNAL Biochen. Biophys. Res. Connun. 148, 546-552 (1987) 

STANDARD full autonatic 

COMMENT Ani dating enzyne protein precursor is cleaved at tuo sites to 
obtain the active enzyne. 

FEATURES Location/Qualifiers 

nRNA <1 . .2733 

/note=“AE-I nRNA" 
sig_peptide 266.. 376 

/codon_start=l 

/note="anidating enzyne signal peptide" 
nat_peptide 377.. 1408 

/codon_start=l 
/note="anidating enzyne" 

CDS 266.. 1468 

/note=“anidating enzyne precursor" 

/codon_start=l 

/translation=”MASLSSSFLVLFLLFQNSCYCFRSPLSVFKRYEESTRSLSNDCL 
GTTRPVMSPGSSDYTLDIRMPGVTPTESDTYLCKSYRLPVDDEAYVVDFRPHANMDTA 
HHMLLFGCNIP5STDDYWDCSAGTCMDKSSIMYAWAKNAPPTKLPEGVGFRVGGKSGS 
RYFVLQVHYGNVKAFQDKHKDCTGVTVRVTPEKQPQIAGIYLSMSVDTVIPPGEEAVN 
SDIACLYNRPTIHPFAYRVHTH0LGQVVSGFRVRHGKWSLIGRQSP9LPSAFYPVEHP 
VEISPGD 1 1 ATRCLFTGKGRTSATY I GGTSNDEMCNLYIHYYMDAAHATSYHTCVQTG 
EPKLF8NIPEIANVPIPVSPDHMHHMGHGHHHTEAEPEKNTGL88PKREEEEVLD8GL 
ITLGDSAV” 

BASE COUNT 823 a 555 c 547 g 808 t 
ORIGIN 

Initial Score = 157 Optinized Score = 960 Significance = 8.29 

Residue Identity = 467. Matches = 1185 Hisnatches = 996 

Gaps = 341 Conservative Substitutions = 0 

X 10 20 

ATGTCCATGAACTGCTGAGTGG 

i mi min i 

TCAGGAGTCCCCTCTCTGTCTTTAAGAGGTATGAGGAATCTACCAGATCACTTTCCAATGACTGCTTGGGAA 
330 340 350 360 370 380 390 400 

30 40 50 60 70 80 

ATAAACAGCACGGGATATCT-CTGTCTAAAGGAATATTACT ACACCAGGA — AAAGGACACATTCGAC 

I I II II II III Mill II Mil I I I I I Mil I II I 
CCACGCGGCCCGTTATGTCTCCAGGCTCATCAGATTATACTCTAGATATCCGCATGCCAGGAGTAACTC — C 
410 420 430 440 450 460 470 

90 100 110 120 130 140 

AACAGGAAAGGA-GCCTGTCACAGAAAACC — ACAGTGT-CCTGTGCATG-TGA CATTTCG-CCATGG 

1111 III I I I I II I III I II III Ml III II II I 
GACAGAGTCGGACACATATTTGTGCAAGTCTTACCGGCTGCCAGTGGATGATGAAGCCTATGTAGTTGACTT 
480 490 500 510 520 530 540 

150 160 170 180 190 200 210 

GAAACAACTGTTACAACGTGG-TGGT-CA-TTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGTGC 

I II II I III III I III I IIM Ml I II II I I II 

CAGACCAC-ATGCCAATATGGATACTGCACATCACATGCTTCTATTTGGATGCAATATACCTTCTTCCACTG 
550 560 570 580 590 600 610 

53ft 51(1 O/lfi osn OiA 07 A oon 


AGAACT CCTGTGATAACTGTCAGCCTGGTACTTTC-TGCAGAAAATACAATCCAGTCTGCAAGAGCTGCC — 

i i i mi iiiii i ii mi inn ilium i n mi 

ATGATTACTGGG — ACTGTAGTGCGGGAACTTGCATGGACA AATCCAGTAT — AATGT ATGCCTG 

620 630 640 650 660 670 

290 300 310 320 330 340 350 

CTCCAAGTA-CCTTCTC-CAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTATTT 

I III I I I I II II I I III I II I I I I 1111 II ill | I 

GGCAAAGAATGCACCACCCACCAAACTT— CCAGAAGGAGT-TGGC-TTTC GTGT-TGGAGG-GAAAT 

680 690 700 710 720 730 

360 370 380 390 400 410 420 

CAGGTTCAAGAAGTTTTGCTCCTCTA-CCCACAACGCGGAGTGTGAGTGCATTGAAGGAT-TCCA T 

mi in iiiii i n i in i i i i iiiii iiiii iiiii n i 

CAGGCAGTAGATATTTTGTGCTTCAAGTTCACTATG-GAAATGTGAAAGCATTCCAGGATAAACATAAAGAT 


740 

750 

760 

770 

780 

790 

800 


430 

440 

450 

460 

470 

480 


TGC-TTGGGGCCACAGT GCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGG 

in mi iiiii i i n n nil n in i n ii n i 

TGCACGGGGGTGACAGTACGAGTAACACCTG-AAAAACAACCGCA--AATTGCAGGCATTTATC 


810 

820 

830 

840 

850 

860 


490 

500 

510 

520 

530 

540 

550 


GTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTG-GACGAAC 

11 III III III II III I III II I I I I II I III I I 

-TTTCAATGTCTG TGGACACTGTTATT--CCACCTGGGGA— AGAGGCAGTTAATTCTGATATCGCC 

870 880 890 900 910 920 930 

560 570 580 590 600 610 620 

TGCTCTCTAGACGGAAGGTCTGTGCTTAAGACC--GGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTG 

in mi n in ii n n i in in i i n i ii 

TGC-CTCT-- ACAACAGG-CCGACAATACACCCATTTGCCTACAGAGTCCACACTCATCAGTTG-GGGCAGG 



940 

950 

960 

970 

980 

990 

630 

640 

650 

660 

670 

680 

690 


TGGTGAGCTTCTCTCCCAGTACCACCATTTC-TGT-GACTCCAGAGGGAGGACCAGGAGGGCACTCCT— TG 

I II II 1 1 III I III I III II I II III I II III II 

TCGTAAG-TGGATTTAGAGTGAGA-CATGGCAAGTGGTCTTTAATTGGTAGACAA — AGCCCACAGCTGCCA 
1000 1010 1020 1030 1040 1050 1060 

700 710 720 730 740 750 760 

CAGG-TCCTTACCTTGTTCCTGGCGCTGACA-TCG-GCTTTGCTGCTGGCCCTGATCTTCATTAC TCT 

mi iiiii n i i n n i i i n n mi i n i n ii 

CAGGCATTTTACCCTG TAGAGCATCCAGTAGAGATTAGC-CCTGGGGAT-ATTATAGCAACCAGGTGT 

1070 1080 1090 1100 1110 1120 1130 


770 780 790 800 810 820 

CCTGTTCTCTGTGCTCAA — ATGGA — TCAGGAAAAAAT-TCCCCCACATATTCAAGCAACCATTTAAGAAG 

mm i n i n in nn n i n i n i n in i i i i i 

-CTGTTCAC — TGGTAAAGGCAGGACGTCAGCAACATATATTGGTGGCACA-TCT — AACGA-- TGAAATG 
1140 1150 1160 1170 1180 1190 

830 840 850 860 870 880 890 

ACCACTGGAGCAGCTCAAGAGGAAGATGCTTGTAGC — TGCCGATGTC-CACAGGAAGAAGAAGGAGGAGG 

1 I I 1 III I I III II II II I I III III II II II 

TGTAATTTA-TACATCATGTATTACATGGATGCGGCCCATG-CTACGTCATACATGACCTGTGTACAGACGG 


1200 

1210 

1220 

1230 

1240 

1250 

1260 

900 

910 

920 

930 

940 

950 

960 


AGGAGGCTATGAGCTGTGATGTACTATCCTAGGAGATGTG TGGGCCGAAACCGAGAAGCACTAGGACC 

n i i i iii n nil iiiii n n n i n nn n in 

GTGAACCAAAGTTATTTCA-AAAC-ATCC-CTGAGAT-TGCAAATGTTCCCATTCCTGTAAGCCCT — GACA 
1270 1280 1290 1300 1310 1320 1330 



CCACCATCCTG-TGGAACA — GCACAAGCA-ACCCCACC--ACCCTG — TTCTTACA CATCATC 

i ii ii 111 111 111 i ini 111 mi mi ii ii i 

TGATGATGATGATGGGACATGGTCACCACCATACAGAAGCTGAGCCTGAGAAGAATACAGGACTTCAGCAGC 
1340 1350 1360 1370 1380 1390 1400 

1030 1040 1050 1060 1070 1080 1090 

CTAGATGATGTGTGGGCG-CGCACCTCATCCAAGTCTCTTCTAACGCTAACATATTTGTCTTTACCTTTTTT 

III I I I II I I I III Hill 1 II I II II 1 1 1 I 

CTAAACGGGAGGAGGAAGAAGTATTAGATCAGGGTCTCAT-TA-CCTTAGGGGATAGCGCAGT— GTGATGG 
1410 1420 1430 1440 1450 1460 1470 

1100 1110 1120 1130 1140 1150 1160 

A-AATCTTTTTTTAAATTTAAATTTTATGTGTGTGAGTGTTTTGCCTGCCTGTATGCACACGTGTGTGTGTG 

ii i i i ii ii i i I iii i i i i ii i mi i i i i 

AGGAGGACATGATCCCTATACCGTTGAAGGGGATGACCCAAT— CAT — TTTAAAGA-ACGT-TCT-TTTA 


1480 

1490 

1500 

1510 

1520 1530 

1170 

1180 

1190 

1200 

1210 1220 1230 


TGTGTGTGTGA-CAC-TCCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTTCCATA-AGAACTGGAGT 

II I II III III I I I I II II I I 1111 I III I II 1 

AACATGAGAGACCACATCCAGGAGACATAAATCCACA-AATTGTATAAGTTGTGTGTATACATCAC — CCT 
1540 1550 1560 1570 1580 1590 1600 

1240 1250 1260 1270 1280 1290 1300 

TATGGATGGCTGTGAGCCGGNNNGATAGGTCGGGA-CGGAGACCTGTCTTC-TTATTTTAACGTGACTGTAT 

II III I II II III II I I 1111 III I II I I III 

TTT — ATGACAAAGATCC-ATAATATAATACGTTATCACTGACCCTTCTGCAACATCCTTAATCCAGGATTT 
1610 1620 1630 1640 1650 1660 1670 

1310 1320 1330 1340 1350 1360 1370 

AATAAAAAAAAAATGATAT-TTCGGGAATTGTAGAGATTGTCCTGACACCCTTCTAGT-TAATGATCTAAGA 

II I II I I I II I I II I I III ill I I I II I I 
GCTCACTCTCCATTGCTGTCATACAGATGTTCACTTATGGGC — AACAAAATACTTTTCTCCTAATTCAGGT 
1680 1690 1700 1710 1720 1730 1740 

1380 1390 1400 1410 1420 1430 

GGAATTGTTGATA-CGTAGTATA-CTGTAT-ATGTG — TATGTATATGTATATGTA-TATATAAGACTCTT 

I II II II III I III II II II II I I II II I II II II II 
CCAGTTTTTCTCATTGAAGTGCATCTGGCTCAATTGACAAATCTA-AAATTGATTTAGGAAATCAG-CTTTT 
1750 1760 1770 1780 1790 1800 1810 


1440 1450 1460 1470 1480 1490 1500 

TTACTGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGGACATTTTAC-GTCACACACAC 

i i inn i n nil i n i i in n i i n nil i i i 

TCCCCATCAAATTGAA — GCTGGCCCAAAAGTTACTCTTAAAAGA-AGGTGACAGTCA-AGTCTC 

1820 1830 1840 1850 1860 1870 


1510 1520 1530 1540 1550 1560 1570 

ACACACACACACACACACACGTTTATACTACGTACTGTTATCGGTATT— CTACGTCATATAATGGGATAGG 

III I III I I 1111 11 II II III II I I I I II I II 
A-ACTTTTGCCCACTGAGTTAGTGATACCAATTCTGTGTAGGGGAATTAAGTAGCTTTTCTTAAAGGGTTGG 
1880 1890 1900 1910 1920 1930 1940 

1580 1590 1600 1610 1620 1630 

GTAA AAGGAAAC CAAAGAGTGAGTGAT— ATTATTGTGGAGGTGACAGACTACCCC 

I I III III I III 1111 II II I I I I I II II 

TTCACCTTTAAGTCAACTTTTAGTATGTTATAGAATGACTAATTCATAAATAAATAAATAAAAG — CAGCTT 
1950 1960 1970 1980 1990 2000 2010 

1640 1650 1660 1670 1680 1690 1700 

TTC — TGGGTACGTAGGGACAGACCTCCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCAAGTTCC- 

in i in i i i i i i i i i n i i in in i i mi 

TTCAATTGGT-CTTCATTATTTATTTTGTATAGTTTTTTTATTATTTGTCTTTTTCATCT-GACTTTTTCCA 
2020 2030 2040 2050 2060 2070 2080 



-CGGACGAA — GAGGACAGAGGAGACACAGTCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCCTGTTTCGT 

I I II I I! I! Ill I II Nil II I I II I II III I 

GCTTTCAAATGGGGGTCA — CTGACCCCATCTAAAAAAACAAATGCTCTGTAAGAC — TACAAATTTATT 
2090 2100 2110 2120 2130 2140 2150 

1780 1790 1800 1810 1820 1830 

GACACTCCACCCCTTGTGGACACTTGAGTGTCA-TCCTTGCGC — CGGAAGGTCAGG — TGGTA — CCCGTC 

I III I I I I II III II I I I I III I I II II II 

GTTACTGCTTTTAATTAGTAATGTTTCTATTCAGGCCCTCCCCTATTCATATTCAAGCCTTTTATTCCAATC 


2160 

2170 

2180 

2190 

2200 

2210 

2220 

1840 

1850 

1860 

1870 

1880 

1890 

1900 


— TGTAGGG— GC-GGGGAGACAGAGCCGCGGGGGAGCTACGAGAATCGACTCACAGGGCGCCCCGG-GCTT 

II I II II III I I II III II II III III I I II I II I 

AGTGCATGGTTGCTAGGGTAATTGGTACCC TAGCAACCAG-ATC-ACTAAAACTGCAAACTGGAGAAC 

2230 2240 2250 2260 2270 2280 

1910 1920 1930 1940 1950 1960 

CGC-AAATGAAA — CTTTTT TAATCTCACAAGTTTCGTCCGGGCTCGGCGGACCTATGGCGTCGATC 

ii in iii ii i ii iiiii ii i i m i ii ii 

TGCTGAATAAAAAGCTAAATAACAAAAAAAACACAA— ATAATAAAAAAT— GTAAACCAACTGC— AAATT 
2290 2300 2310 2320 2330 2340 2350 

1970 1980 1990 2000 2010 2020 2030 

CTTATTACCTTATCCTGGCGCCAAGATAAAACA-ACCAAAAGCCTTGACT — CCGGTACTAATTCTCCCTGC 

i i i i i mi in n i n ii inn ii i i in n in 

GTCAGAATATCACCCTG— TACAATCTACATCACACTAAAAG— TTAATTTAAAGGT— GAACAACCCCATA 
2360 2370 2380 2390 2400 2410 2420 

2040 2050 2060 2070 2080 2090 

CGGCCCCCGTAAGCATAACGCGGCGATCTCCACTTTA AGAACCTGGCCGCGTTCTGCCTGGTCT-C 

ii in ii i ii i mi i i iiiii i i i n n i 

AGGAAGACATA— CAATTTGTGGATACACACACTACAGACACTACAACCTAGATG-GCTCATTAAGGAATAT 
2430 2440 2450 2460 2470 2480 

2100 2110 2120 2130 2140 2150 2160 

GCTTTCGTAAACGGTTCTTACAAAAGTAATTAGTT — CT — TGCTTTCAGCCTCCAAGCTTCTGCT-AGTCT 

I III II III III I III 1 II II II III I I I II II I I III 

GATTTACATTTTATTTATTAAAAATGAAATGATTTAACTGTTGATTT-TGAAT — TGATTATGTTGATTCT 

2490 2500 2510 2520 2530 2540 2550 

2170 2180 2190 2200 2210 2220 2230 

ATGGCAGCATCAAGGCTGGTATTTGCTACGGCTGACCGCTACGCCGCCGCAATAAGGGTACTGGGCGGCCCG 

i i in ii nil i i i i n i i i i m i nil i n i 

AATGTTGAAT TGTTATTGGG'TGCTGAAAACTGATCATAGGGTGGAAT — GTATACTTTTC-TCCTG 

2560 2570 2580 2590 2600 2610 2620 

2240 2250 2260 2270 2280 2290 

— TCGAAG-GCCCTTTGGT — TTCAGAAACCCAAGGCCCCCCTCATACCAACGTTTCG ACT-TTGAT 

1 I I I II III I II III 1 1 I II I IIIII III II II 

AGATTGGTGTGGTGTTGGGTCTTACATAAATC — TTTACTTTGTACTATGATTTTTTCGAAAAACTCTTAAT 
2630 2640 2650 2660 2670 2680 2690 

2300 2310 2320 2330 2340 2350 

TCTTGCCGGTACGTGGTGGTGGGTGCCTTAGCTCTTTCTCGATAG— TTAGAC 

II I II I II I III I I I III I II I II I 

TAT GTAACTTCTTG — GAGTGAATAAAC-CTTAAT — ATTGCATTGGG 

2700 2710 2720 2730 X 


6. ELL1S-012-FIG2AB.SEQ (1-2350) 

HUMUT5094 Hunan chronosone 4 STS UT5094. 

LOCUS HUNUT5094 468 bp ds-DNA PRI 28-MAY-1993 

DEFINITION Hunan chronosone 4 STS UT5094. 



KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 


TITLE 

JOURNAL 

STANDARD 

COMMENT 


PCR primer; STS; nicrosatellite DNA; nicrosatellite marker; 
repeat polymorphism; sequence tagged site; tetranucleotide repeat. 
Homo sapiens DNA. 

Homo sapiens 

Eukaryota; Animal i a; Chordata; Vertebrata; Mammalia; Theria; 
Eutheria; Primates; Haplorhini; Catarrhini; Hominidae. 

1 (bases 1 to 468) 

Gerken.S.C.i Matsunani ,N. , Laurence, E., Carlson. M.» Moore. M.. 
Ballard. L.. Melis.R., Robertson. M. . Bradley. P.. Eisner. T.. 
Tingey.A.. Rodriguez. P. , Albertsen.H. . Lalouel.J.-M. and White. R. 
Genetic and physical mapping of simple sequence repeat containing 
sequence tagged sites from the human genome 
Unpublished (1993) See COMMENT for author address, 
full automatic 

Submitted by; Utah Center for Human Genome Research 
University of Utah. Dept, of Human Genetics 
2160 Eccles Institute of Human Genetics 
Salt Lake City, UT 84112 
e-mail: stsgcorona.ned.utah.edu 


Primer A: CTGCACTCGAGCCTGAGCA 
Primer B: CCTTGGAAATGAGGCTGCTC 
32P-label: B Priner 
PCR Profile: 

Initial Denaturation: 94C 300sec 
PCR Cycles: 5 
Denaturation: 94C lOsec 
Annealing: 62C lOsec 
Extension: 72C 20sec 
Mg++: 3mM 

Gel: Acrylamide 7X, Formamide 32'/., Urea 34X 
Alleles: 6. 

FEATURES Location/Qualifiers 

nisc feature 47.. 291 


/note="This feature applies to a gene which lacks a coding 
region feature. “ 

/map= D 4 n 

priner_bind 47., 65 

STS " 47.. 291 


/standard name=”STS UT5094" 


/nap="4” 

/evidence=“EXPERIMENTAL” 
priner_bind complement(272. .291) 

source 1..468 


/organisn="Homo sapiens 0 
/5equenced_nol="DNA" 

BASE COUNT 186 a 75 c ’ 105 g 97 t 
ORIGIN 


5 others 


Initial Score = 
Residue Identity = 
Gaps = 


152 Optimized Score = 210 Significance = 7.95 
51X Hatches = 262 Mismatches = 173 
75 Conservative Substitutions = 0 


660 670 680 690 700 710 720 

TTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATC 

llll nil! I! II 

ACTTGAGCCTGG GA-GTC 

X 10 


730 740 750 760 770 780 790 800 
GG-CTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAAAAAATTCC 

II llll I I! llll I III! I III ill III I III III I II II 
GGAGGTTGCAG-TG-AGCTGA-CATCAT — GCCACTGCACTC-GAGC-CTGA-GCAACAG — AGCAAGACC 
20 30 40 50 60 70 


n i A non o in o/\n ocn 04 a 


CCCACATATTCAAGCAACCATTT AAG-AAGACCACTGGAGCAGCTCAAG-AGG'A-AGA-T GCTTG-TAGCTG 

i i i ii ii i 111 mi i i i i 111 mi in i i ii 

CNGTTAAAAAAAAAGAA— GTNGAAGAAAGAAGAAAGAAAGAAAGAAAGAAGGAAAGAAAGAAAGAAAGAAA 
80 90 100 110 120 130 140 

870 880 890 900 910 920 930 

CCGATGTCCACAGGAAG-AAGAAGG— AGGAGG— AGGAGGCTATGAGCTG-TGATGTACTATCCTAGGAGA 

1 I II III lllll I 11 1 I II I I I I I I 1 I I 1 III III 

GAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGAAAGGNGAAGAAAAAGAAACAAGCTA-AAGA 
150 160 170 180 190 200 210 

940 950 960 970 980 990 1000 

TGTGTGGG-CCGAAACCGAGAAGCACTAGGAC-CCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACC 

III III llll II II I I I III II I II III till II II II 

-GTGAGGGAGAGAAAAATTCAAACATGAAGTCTCCCTAGAGC — TTGANCAGGGTGAGCAGCCTCATTTCC 
220 230 240 250 260 270 280 

1010 1020 1030 1040 1050 1060 

CTGTTCTTACACATCATCCTAGATGATGTGTGGGCGCGCACC TCATCC AAGTCTCTTCTAAC 

i in i in n ii mi i i i i n i mm n n i i i 

AAGGNCTTGC CATTGT-CATCATGTCT-GCCCCTCAACATGTTCATCCGACAAAAAATCATATTTGAT 

290 300 310 320 330 340 350 

1070 1080 1090 1100 1110 1120 

GCT AACAT-ATT-TGTCTTTACCTTTTTTAAATCTTTT-TTTAAATTTAAATTTTATGTG — TGTGAG 

m mi m n mi i i inn n i iiimn nil in u 

GCTTTGCTACATGATTGTGATGTTACATGAGCTGTTCCTTTTGCTTTGCTATAAATTTTTTGTGTTTGTAAG 
360 370 380 390 400 410 420 

1130 1140 1150 1160 1170 1180 X 1190 1200 

TGTTTTGCCTGCCTGTATGCACACGTGTGTGTGTGTGTGTGTGTGACACTCCTGATGCCTGAGGAGGTCAGA 

iii ii mini i n in nil n 

CATCTAG — AAATG-ATGCACAAG-CACCGT TAATTCA-ACTCAATAT 

430 440 450 460 X 

1210 1220 1230 

AGAGAAAGGGTTGGTTCCATAAGAACTGGAGTTA 


7. ELLIS-012-FIG2AB.SEQ (1-2350) 

S53907 XRAR alpha 2=retinoic acid receptor isoforn alpha 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

COMMENT 


FEATURES 

CDS 


S53907 3240 bp nRNA VRT 23-MAR-1993 

XRAR alpha 2=retinoic acid receptor isoforn alpha 2.1 CXenopusi 
embryos. nRNA. 3240 ntl 
S53907 

Xenopus enbryos 

Xenopus 

Unclassified. 

1 (bases 1 to 3240) 

Sharpe. C.R, 

Two isoforns of retinoic acid receptor alpha expressed during 
Xenopus development respond to retinoic acid. 

Hech. Dev. 39, 81-93 (1992) 
full autonatic 

This entry CNCBI gibbsq 123865] uas created by the journal scanning 
component of NCBI/GenBank at the National Library of Medicine. 

This sequence comes from Fig. 1A, 

Location/Qualifiers 
516. .1910 

/note=" isoform XRAR alpha 2.1: For the protein sequence 
(NCBI gibbsq 123867): Method: conceptual translation 
supplied by author. This sequence comes from Fig. 1A." 
/gene="XRAR<alpha>2“ 


/codon_start=l 

/transTalion="HVSLDFSRHYENVDVPALABSPTRFHNHDFYSHNRBCLLaEKGI 
GTIHPYGTPLRTflHWSSSNHSIETflSTSSEEIVPSPPSPPPLPRIYKPCFVCfiDKSSG 
YHYGVSACEGCKGFFRRS I QKNMVYTCHRDKNC I INKVTRNRCQYCRLQKCFEVGMSK 
ESVRNDRNKKKKESPKPEA1ESYILSPETQDLIEKVSKAHQETFPALCQLGKYTTSFS 
SESR VSLO 1 DLUDKF5ELSTKC I I KT VEF AKeLPGFTTLT I ADfi 1 TLLKSAGLD I L I L 
RICTRYTPDQDTNTFSDGLTLNRTQMHNAGFGPLTDLVFAFANQLVPLEI1DDAETGLL 
SAICLI CGDRQDLEQPDKVDKLGEPLLEALKI YVRTRRPQKPHMFPKMLMKITDLRSI 
SAKGAERVITLKHEIPGAHPPLI8EHLENSEGLDTLGGGASSDAPVTPVAPGSCSPSL 
SPSSTHSSPSTHSP” 

BASE COUNT 811 a 852 c 791 g 788 t 
ORIGIN 


Initial Score = 

151 

Optimized Score = 961 

Significance = 

7.88 

Residue Identity = 

46X 

Hatches = 1160 

Hismatches = 

1053 

Gaps = 

295 

Conservative Substitutions 

= 

0 


X 10 

ATGTCCATGAACTG — CTGAG 

II I III I II 

TCTACAGTCACAACCGACAGTGCCTTTTGCAGGAGAAAGGGATTGGGACCATTCACCCGTACGGGACCCCAC 
610 620 630 640 650 X 660 670 

20 30 40 50 60 70 80 

TGGATAAACAGCACGGGA-TATCTCTGTCTAAAGGAATATTACT-ACACCAGGA AAAGGACACATTCG 

I I II III III I III II II III I I II II I III I III 

TACGGACTCAACACTGGAGCAGCTCCAACCACTCAATTGAGACTCAAAGCACGAGTTCAGAGG— AGATTGT 

680 690 700 710 720 730 740 

90 100 110 120 130 140 150 

ACAACAGGAAAGGA-GCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGCCATGGGAAACAA 

II III I II III I III I I I I I II I I I II II II I 

AC-CCAGCCCCCCATCCCCACCACCGCTCCCCAGAATCTACAAGCCCTGCTTTGTGT-GTCA-GGACAAGAG 

750 760 770 780 790 800 810 

160 170 180 190 200 210 

CT — GTTACAACGTGGTGGTCA TTGTG-CTGCTGCTAGTGGGCTGTGAGAAGGTGGGAG — CCGTGC 

i i ii ii i mi mu i ill i iiiii i n i n i i 

TTCGGGGTATCACTATGGAGTCAGCGCTTGTGAAGGTTGCAA— GGGCTTT— TTCCGTCGCAGTATCCAGA 
820 830 840 850 860 870 880 


220 230 240 250 260 270 280 

AGAAC-TCCTGTGATAACTGTCAGCCTGGTACTTTCTGCAGAAAATAC-AATCCAGTCTGCAAGAGCTGCCC 

him i m i i mu i i mi i i ii ii ii i mi mm 

AGAACATGGTGT-ACACGTGTCACAGAGACAAGAATTGCATCATAAACAAAGTCACGC-GCAACCGCTGCC- 
890 900 910 920 930 940 950 

290 300 310 320 330 340 350 

TCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCT-ATTTCA 

I I II I III I I I I II I II III III I II I I I II II 

-AGTATTGCCGATTGCAGAAATGTTTCGAGGTCGGAATG TCTAAAGAATCCGTACGGAATGATCGCA 

960 970 980 990 1000 1010 

360 370 380 390 400 410 420 

GGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAG-TGCATTGAAGGATTCCATTG-CTTGGG 

IIIII III I I I I II I I III I III I II III I I I 

ACAAGAAGAAAAAGGAGTCCCCAAAGCCTGAGGC-AATAGAGAGTTACAT — ACTGAGCCCAGAGACACAAG 
1020 1030 1040 1050 1060 1070 1080 

430 440 450 460 470 480 490 

GCCACAGTGCACCAGATGTGAAAAGGACTGCAGG — CCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAAC 

I II II I I II IIIII I till III III I II III II IIIII 
ATCTCATTG-AGAAAGTGCAAAAAGCCCACCAGGAGACCTTCCCTGCA-CTCTGCCAGCTGG — GCAAATA 
1090 1100 1110 1120 1130 1140 1150 


GTGT AGCTT GGG'AACATTTA — ATGACCAGAACGGTACTGGCGTCTGTCGACC-CTG-GACGA ACTG 


CACTA CAAGTTTTAGCTCGGAGCAGCGGGTTTCTCTGGAC-ATCGACCTGTGGGACAAGTTCAGTG 

1160 1170 1180 1190 1200 1210 


570 580 590 600 610 620 630 

CTCTCTAGACGGAAGGTCTGTGCTTAAGACCGGGACCACGGAGAAGG'ACGTGGTGTGTGGACCCCCTGTGGT 

mi n u i i i i mu ii i n ii mi m i i i 

AGCTCTCCACTAAA — TGTATCATCAAGACGGTGGAATTTGCCAA — AC-AGTTGCCGGGATTCACCACTCT 
1220 1230 1240 1250 1260 1270 1280 


640 650 660 670 680 690 

GAGCTTC-TCTCCCAGTACCACCATTTCTGTGA — CTCCAG-AGGGAGGACC— AGGAGGGCACTCCTTGCA 

ii i ii i mi i mi i m i i ii i m ii in mu 

GACCATCGCCGACCAG-ATCACC-CTCCTGAAATCCGCCTGCCTGGATATTCTTATCCTGCGAAT— TTGCA 
1290 1300 1310 1320 1330 1340 1350 

700 710 720 730 740 750 760 

GGTCCTTA— CCTTGTTCCTGGCGC— TGACATCGGCTTTGCTG — CTGGCCCTGATCTTCATTACTCTCC 

I III II II II III llllll II I I III till I I II | || 

-CACGTTACACCCCTGATCAGGACACCATGACAT— TCTCAGACGGACTGACCCTAAACCGCACTCAGATGC 


1360 

1370 

1380 

1390 

1400 

1410 1420 

770 

780 

790 

800 

810 

820 


TGTTCTCTGTGCTCAAATGGATCA-GGA— AAAAATTCCCCCAC-ATATTCA — AGCAACCATTTAAGAAG 

I I I I II I III II III II I II III I I II II I 

ACAACGCGGGGTTCGGACCTCTCACAGACCTGGTCTTCGCCTTCGCTAATCAGCTCGTGCCGCTTGAAATGG 
1430 1440 1450 1460 1470 1480 1490 

830 840 850 860 870 880 890 

ACCAC-TGGAG-CAGCTCAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAG-GAAGA — AGAAGGAGGAG 

ii ii m i i ii i i ii m i mu ii i i i m i mi i 

ACGACGCTGAGACCGGTCTACTG — AG-TGC — CATCTGCC TCATCTGTGGAGACCGGCAGGACCTG 

1500 1510 1520 1530 1540 1550 

900 910 920 930 940 950 960 

GAGGAGGC-TATGAGCTGTGATGTACTATCCTAGGAG-ATGTGTGGG--CCGAAACCGAGAAGCACTAGGAC 

III II I II I II II III lllll I I I II II I II I lllll 

GAGCAGCCAGATAAAGTG-GACAAACT — GCAGGAGCCTCTTTTGGAAGCGTTAAAGATCTACGTCAGGAC 
1560 1570 1580 1590 1600 1610 1620 

970 980 990 1000 1010 1020 1030 

CCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACCCTGTTCTTACACATCATCCTAGATGATGTG — T 

i i i ii m mi i mu i ii ii i i mi i i ii i i i 

CAGGCGA-CCCCAAAAACCTCACATG-TTCCCCAAAA — TGCTCATGAAGATCA-CAGACCTGCGGAGCAT 
1630 1640 1650 1660 1670 1680 1690 

1040 1050 1060 1070 1080 1090 1100 

GGGCGCGCACCTCATCCAAGTCTCTTCTAACGCT-AACATATTTGTCTTTACCTTTTTTAAATCTTTTTTTA 

I II II I II I I I II II II II II I III 

CAGTGC-CAAGGGTGCGGAGCGTGTGATCACTCTGAAGATGGAGATCCCGGGGGCCATGCCCCCCCTCATCC 
1700 1710 1720 1730 1740 1750 1760 

1110 1120 1130 1140 1150 1160 1170 

AATTTAAATTTTATGTGTGTGAGTGTTTTGCCTGCCTGTATGCACACGTGTGTGTGT — GTG-TGTGTGTG 

I I II I I III I II I I II I III I I I III II I 

AGGAGATGTTGGAGAACTCGGAGGGGTTGGACACATTGGGGG GTG-GGGCATCCAGTGATGCACCAG 

1770 1730 1790 1800 1810 1820 

1180 1190 1200 1210 1220 1230 

ACACTCCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGT TGGTTCCA-TAAGAACTGGAGTT — ATG 

m ii i ii mi i i i ii ii i mm i i i i i ii 

TCACACCAGTAGCACCAGGAAG-CTGCAGTCCCAGTCTGTCTCCCAGTTCCACTCACAGCAGCCCCTCCACT 
1830 1840 1850 1860 1870 1880 1890 

(pan iobo loof* 


GA-TGGCT GTGAGCCGGNNNG'AT AGGT C — GGGACGGAGACCTGT — CTTC — TTATTTTAACGTGAC-TG 

m i 111 ii ii i i ii i 111 i mi mi n i i n 

CACTCACCCTGACCCCCCCACCCAGAACAAATGCACAGCCCCCTCTCACTTCCTTTATCTTTTTGCCCCTTG 
1900 1910 1920 1930 1940 1950 1960 1970 

1300 1310 1320 1330 1340 1350 1360 

T — ATAATAAAAAAAAAATGATATTTCGGGAATTGTAGAGATTGTCCTGACACCCTTCTAGTTAATGATCT 

i i i i n i i i i n n i n ini i i n 

TCCCGCCCCCTTTTCTACTTCCTTTTCCTCTTACTTGACAGCCACTCATCACTGTGCTCTACCT--GTACCT 
1980 1990 2000 2010 2020 2030 2040 

1370 1380 1390 1400 1410 1420 1430 

AAGAGGAATTGT-TGATACGTAGTATACTGTATATGTGTATGTATATGTATATGTATATATA-AGACTCTTT 

i i i in n i n n n n i ii i i i i i i i i linn 

GGAATGCAATGTGTGCAAGAGAGGATTGTGGGTAAGGAGGAGGAGGAGCA-AAGCAGCTGGAGATGCTCTTT 
2050 2060 2070 2080 2090 2100 2110 

1440 1450 1460 1470 1480 1490 1500 

TACTGTCAAAGTCAACCTAGAGTGTCTGGTTA-CCAGGTCAATTTTATTGGACATTTTACGTCACACAC — 

in ii in iii mi in n i i i i i n i i i in 

TAC — ACCACTTCA — CCACAAACTCTGCTTATTGGGGCCTCGTGGGGTCTCCCTGTTCTGCCTC-CACTGG 
2120 2130 2140 2150 2160 2170 

1510 1520 1530 1540 1550 1560 1570 

ACACACACACACACA-CACAC ACGTTTATACTACGTACTGTTATCGGTATTCTACGTCATATAATG 

1 1 I II I I II II 1 II I I III I II II I I I ' I 

ATTGAGAAGGACTGATGAGACTGGGGGACCGACACACGAGGGGCTG-GAGGAACATGCTCAGACTGCTGGGG 
2180 2190 2200 2210 2220 2230 2240 

1580 1590 1600 1610 1620 1630 1640 

GGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATATTATTGTGGAGGTGACAGACTACCCCTTCTGGGTACG 

in n n i in n i i i mi i i i n i n in in i n 

GGACAGTGT-CTCGCCCACCTCAG-GGGGGGGATACGCT— GGAATGTTA — AC— CCCTTTCACTGCACC 
2250 2260 2270 2280 2290 2300 2310 

1650 1660 1670 1680 1690 1700 1710 

TAGGGACAGACCTC-CTTCGGAC— TG— TCTAAAACTCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGAAG 

i n in i i n n i n n i i n i i i in i i mi 

CCAG— AAGCACTCTGTATGTACATTGGGTTTAGAAATGCAAACAGTATTTT— ACAA CTATATGAAG 


2320 

2330 

2340 

2350 

2360 

2370 

1720 

1730 

1740 

1750 

1760 

1770 1780 


AGGACAGAGGAGACACAGTCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCCTGTTTCGTGACACTCCACCC 

in i i i n i ii inn ini i i inn n n i 

TGTAAA ACAAATAGGGCAGCTTGATTTTTTTAATGGAAA — CAGTAGGTGTTT-TAGAATCTATATAA 

2380 2390 2400 2410 2420 2430 2440 

1790 1800 1810 1820 1830 1840 1850 

CTTGTGGACACTTG-AGTGTCATCCTTGCGCCGGAAGGTCAGGTGGTACCCGTCTGTAGGGGCGGGGAGACA 

Ml I III II 1 II I II III I II 1 II I I II I 
ATTGTTATAAATTGTATTTTTGCTATTTCAATTTCAGTTCACACAAAAACC-TTTG-AAAAAAGTGGCTCCT 
2450 2460 2470 2480 2490 2500 2510 

1860 1870 1880 1890 1900 1910 

GAGCCGCGG — GGGAGC TACG'AGAATC— GACTC — ACAGGG-CGCCCCGGGCTTC-GCAAATGAA 

I I II I I III I I I III I III Ml II I III I I Ml I 

GTGTGGCAGTAATGTAGCCCTGTCCCA-CATCATGCCTCGGGGTAGGGACGTGTC-TGCTACTGACAATGGA 


2520 

2530 

2540 

2550 

2560 

2570 2580 

1920 

1930 

1940 

1950 

1960 

1970 


ACTTTTTTA-ATCTCACAAGTTTCGTCCGGGCTCGGCGGACCTAT-GGCGTCGATCCTT — ATTACCTTAT 

II II I III III I I II I III I I I I I I II Ml II 

ACACAACTACAGCTCCCAA CAACTGGAAAAAGGGGAGTTTTAGTTCTGCAACATTGGCTGTACCACAT 

2590 2600 2610 2620 2630 2640 

i qqo loon onrin onin ofto/i omft oozm 


CCTGGCGCCAAGAT AAAACAACCAAAAGCCTTGACT C— CG — GTACT — AATTCTCCCTGCCGGCCCC 

llll I II III ill I III I II I II I I I llll I II II 
GCTGCCTGTCAC-CTGTATATTTCAAGC — TGGCCAT-ACACACGA AGCAATACAATTGTACC-AAATACACG 
2650 2660 2670 2630 2690 2700 2710 

2050 2060 2070 2080 2090 2100 2110 

CGTAAGCATAACGCGGCGATCTCCACTTTAAGAACCTGGCCGCGTTCTGCCTGGTCTCGCTTTCGTAAACGG 

II II I II I I I III II I I I I I II I I I II I I 

TGTGTGCTGGGGGGGGGGGTTGTGA — TAACGTTGTGCCATGGATATATC-AGTTGGGTACTGGAAACCAG 
2720 2730 2740 2750 2760 2770 2780 

2120 2130 2140 2150 2160 2170 2180 

TTCTTACAAAAGTA-ATTAGTTCTTGCTTTCAG-CCTCCAA--GCTTCTGCTAGTCTATGGCAGCATCAAGG 

n i i in n in n i in i i i i mi nn n n ii 

CATTTGCTCATGTATATCTGTTAGTG — TACAGTGCACTGACTACATCTGTTAGTGTA — CACTTACTATA 
2790 2800 2810 2820 2830 2840 2850 

2190 2200 2210 2220 2230 2240 

CTGGTATTTGCTACGGCTGACCG-CTAC-GCCGCCGCAATAAGG-GTACTG GGCGG-CCCGTCGAAG 

in i inn ii i nil ii i n i nil in i i i 

TCGGT-TAGAGTACGGTGCACTGACTACATCTGTTAGTGTTCGGTGCACTGACACAAGAGGATACAATGGCG 
2860 2870 2880 2890 2900 2910 2920 

2250 2260 2270 2280 2290 2300 2310 

GCCCTTTGGTTTCAGAAACCCAAGGCCCCCCTCATAC-CAACGTTTCGACTTTGATTCTTGCCGGTACGTGG 

II I II I L I I II I llll I II II II I I II III I I I I 
GCTATACTGTGT-ATGGTCACTAGTACAGTATCATCCTGAAAGT — GA--TAGCTTATTGGCTGGA— TAG 
2930 2940 2950 2960 2970 2980 

2320 2330 2340 2350 

TGGTGGGTGCCTTAGCTCTTTCTCGATAGTT — AGAC 

II I II II I I I I I I III 

TGAGCAATACC-AGGCAGTGTATGGCCACCTTGGGGACCCCCAGGCCTGCTTTTGCACTGCCCACCCTAATA 
2990 3000 3010 3020 X 3030 3040 3050 

TAGTTGTTTTAACAGT 
3060 3070 


8. ELLIS-012-FIG2AB.SEQ (1-2350) 

PFASXC Plasmodium falciparum sexual stage nRNA sequence. 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

JOURNAL 

STANDARD 

FEATURES 

5'UTR 


PFASXC 2306 bp ss-mRNA INV 06-APR-1993 

Plasmodium falciparum sexual stage nRNA sequence. 

M64106 

Plasmodium falciparum (strain 3D7) sexual stage-gametocyte and 
gamete cDNA to mRNA. 

Plasmodium falciparum 

Eukaryota! Animalia; Protozoa! Apicomplexa! Sporozoa! Coccidia! 
Eucoccidiida! Haemosporina! Plasmodiidae. 

1 (bases 1 to 2306) 

Alano>P. and Elliott.J.F. 

Unpublished (1993) 
full automatic 

Location/Qualifiers 

1..43 


/note="putative" 
polyA_site 2306 

/note="immature polyA_site! probable polyA_tail derived 
from oligo-dT priming! sequence differs from the germline 
sequence in this region! putative" 
source 1..2321 


/organism="Plasmodium falciparum" 
/strain="3D7“ 


BASE COUNT 
ORIGIN 


/sequenced_mol=“cDNA to mRNA" 
/germline 

905 a 279 c 371 g 751 t 


Initial Score = 150 Optimized Score = 808 Significance = 7.81 

Residue Identity = 467. Matches = 992 Mismatches = 847 

Gaps = 275 Conservative Substitutions = 0 

320 330 340 350 360 370 380 390 

CCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAAC 

II II I II I II I I 

GTATTTTCCATCAATTCATATC 
X 10 20 

400 410 420 430 440 450 460 

GCGGAGTGTGAGTGCATTGAAGGATTCCATTGCTTGGGGCCACAGTGCACCAGAT-GTGAAAAGGACTGCAG 

i i mi ii i ii iiiii ii i ii ini i n m 

GTTTA-AATAATTTTTTTCCA-AATGAAGTTGCTGTTGTTC TTTTTCATATCGTCTATCTTCCTTCAG 

30 40 50 60 70 80 

470 480 490 500 510 520 

GCCTGGCC-AGGAGCTAACGA-AGCAGGGTTGCAAAAC — CTGTA GCTT-GGGAA-- CATTTAATGAC 

III II I 1 II II I I II I ill I II III llll II III 1 
— CTGACCTCTGGGAAAAGGATATTAAATTTTGATAACATCATTAAACATCTTAAGGAAAGCAAATTATTGC 
90 100 110 120 130 140 150 

530 540 550 560 570 580 

CAGAACGGTA — CTGGCGTCTGTCGACCCTGGACGAACT — GCT-CTCTAGACGGAAGGTCTGTGCTTAA 

I III I II II III I I II II II I I I III I II M I II 
CTGAA-GATATCCCTCACGT-TTTAGAAAATGACATAATTATAGTTCCTCCTTATTTAA— TTTATAAAT-A 
160 170 180 190 200 210 220 

590 600 610 620 630 640 650 

GACCGGGACCACGGA-GAAGGACGTGGT-GTGTGGACC-CCCTGTGGTGAGCTTCTCTCCCAGTACCACCAT 

I II I I II II I I III llll II II I I I I II I I I II 

TAAAGGAAAAATATATCACCTACATAATAATGTAGACCTTACTTTGATAAAC CATCCTG-AAGAAGAT 


230 

240 

250 

260 

270 

280 

290 

660 

670 

680 

690 

700 

710 

720 


TTCTGTGACTCCAGAGGGAGGACCAGGAGGGCA— CTCCTTGCAGGTCCT TACCTTGTTCCTG-GCGC 

I II I I I III II I III I I III I llll llll I III || 

TCATG CGATAAGGAAGA-AATTTGGGAATCCCCATTTC — CTCCTAAAGTACC-GGAACCGGAACGA 

300 310 320 330 340 350 

730 740 750 760 770 780 790 

TGACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAAAA 

III II I I III I II I I II I I I ill II I llll I I 
CCACA-CGAACCAGAAATGGAACCTCAGGTTGAACCCGAACCAGGAC-CATTGCCTGAAGAGGTCAG— AGA 
360 370 380 390 400 410 420 

800 810 820 830 840 850 860 

AATTCCCCCACATATTCAAGC — AACCA-TTTAAGAA-GACCACTGGAGCAGCTCAAGAGGAAGATGCTTGT 

i i ii i iiii iiiii mu i i mi ii iiiii mi ii hi 

ACCTGAACCGGAACCAGAAGCAGAACCAGAAAAAGAATTAGAAATGGA — AGAACAAGAAGAAG-TGATTGA 
430 440 450 460 470 480 490 

870 880 890 900 910 920 930 

AGCTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAGGCTATGAGCTGTGATGTAC TATCCTAGG 

Hill III I II llllll III I I I I I 1 I II I llllll 
AGCTG-ATATG-GTATTAGACGAAGAAACTGGAATTAAAATCCCTAAAAAGACA-GAACAAGATGTCCTAGA 
500 510 520 530 540 550 560 

940 950 960 970 980 990 

AGATGTGTGGGCCGAAACCGAGAAGCACTAGGA — CCCCACCATCCTGTGGAA-CAGCAC — AAGCAACC 

i ii ii i 1 1 1 1 1 1 1 i 1 1 1 1 iiiii 


■ 1 1 1 1 1 1 i i i 


mi i 



A GTAAGCAAATTTCGAGAAG-AATATGAATTACCTAACGT-- TGTGGAATTAACTCCTGAAG-AGAA 


570 

580 

590 

600 

610 

620 

1010 

1020 

1030 

1040 

1050 

1060 


CCACCACCCTGTTCTTACACATCATCCTAGATGATGTGTGGGC-GCGCACCTCATCCAAGTCTCTTCTAACG 

II I I II INI I I II III I I II I I III I II 
AGAAAAGAATAATATTTTACATTTTGC-AGGTAATAAAAGTACAGCTTTCAATTTAAAAGAGATTATAAATT 
630 640 650 660 670 680 690 

1070 1080 1090 1100 1110 1120 1130 

CTAACATATTTGTCTTTACCTTTTTTAAATCTTTTTTTAAATTTAAATTTTATGTGTGTGAGTGTTTTGCCT 

III I I II I III I III III I I II III II I I I I I I 

ATAAAAAAGATG AAAGTTTAATGAATAGTTTATCTAGTT — CCTTTGATCATTTTTA-TACTCCTAAT 

700 710 720 730 740 750 760 


1140 1150 1160 1170 1180 1190 1200 

GCCTGTATGCAC-ACGTGTGTGT-GTGTGTGT-GTGT-GACACTCCTGATGCCTGAGGAGGTCAGAAGAG-A 

I II I I III I II II II II III II I I llll I I II I Hill 

G-TTG-AAGAACGAAATTTGAGTAATGCTTATAATGTGGATATTAATGAT-TATTATGATTTATTAAGAGCG 
770 780 790 800 810 820 

1210 1220 1230 1240 1250 1260 1270 

AAGGGTTGGTTCCATAAGAACTGGAGTTATGGATGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTG 

I II III II I II II llll I II I II II I 

TTACATATATTATGTAAAGAC-GAAGATAATAA-GTTATATACAATAAATAGAATACCTAAAGATGAGTTA 


830 

840 

850 860 

870 

880 

890 

1280 

1290 

1300 

1310 

1320 

1330 


TCTTCTTATTTTAA-CGTG — ACTGTA — TA-ATAAAA AAAAAATGATATTTCGGGAATTG — TAG 

i i ii i iiii i ii iiii ii iii ii iiiiii mm n n n 

TTATTTTTTCTTAAGCATGCATATGTAAATATATATAATTCTTTAAAAAAATATATTTTATTAAATGGCTAT 
900 910 920 930 940 950 960 970 

1340 1350 1360 1370 1380 1390 1400 

AGATTGTCCTGACACCCTTCTAGTTAATGATCTAAGAGGAATTGTTGATACGTAGTATACTGTATATGTGTA 

i in i ii ii ii ii in i mu ii i i i i i i i i ii 

A-ATTTTGAAGA TTATATATATACCTCTGA — GAATT-TTACTTTAGATCAAATTTTTAAAGATTA 

980 990 1000 1010 1020 1030 

1410 1420 1430 1440 1450 1460 1470 

TGTATATGTAT— ATGTAT-ATA-TAA-GACTCTTTTACTGTCAA-AGT-CAACCTAGAGTGTCTGGT-TAC 

II I I III I II III III II I II I I III I II 1 I III I I II 

TTTTTTTTTATCAAACGATGATACTAATGAAAATGGTAGTTTTAATAATATAATCGAAAGTATAAAATATAT 
1040 1050 1060 1070 1080 1090 1100 

1480 1490 1500 1510 1520 1530 

CAGGTCAATTTTATTG-GACATTTTACGTCACACACACACACACACACACACACACACGTT TATA 

II I II III I I III I II I I IIIIII I I I II llll 
CAAG — AAAGCTATAGAAAAATTAAATGTAAAAAGAATAGA-AGAGAAGATAAAATATTTTTTTCAAATATA 
1110 1120 1130 1140 1150 1160 1170 

1540 1550 1560 1570 1580 1590 1600 

CTACGTAC-TGTTATCGGTATTCTACGTCATATAATGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATA 

I llll II I I I III II I III I III I llll I III I I I I 

TGAGGTACATGCTTTTG— ATTTTAAAT— TATTACATTATATATTTTCTCGAAATC AGTTA-TTAAA 

1180 1190 1200 1210 1220 1230 

1610 1620 1630 1640 1650 1660 1670 

TTATTGTGGAGGTGACAGACTACCCCTTCTGGGTACGTAGGGACAGACCTCCTTCGGACTGTCTAAAACTCC 

nil in n in i i i n i in n i i i in i n i iiii 

TTATAGTGAAGATGATTTAATTCACC-ACGCAGTAGATA-TTATGAATATTACAAGGA-TAGATATATCTCC 
1240 1250 1260 1270 1280 1290 1300 

1680 1690 1700 1710 1720 1730 1740 

CCTTAGAAGTCTCGTCAAGTTCCCGGACGAAGAGGACAGAGGAGACACAGTCCGAAAAGTTATTT-TTCCGG 



AAGGGTAAT--ATCTGCCTTATTTA-TGTATTTCTTAAATA-AG — GTAAATATATTTCTTGTAC 

1310 1320 1330 1340 1350 1360 

1750 1760 1770 1780 1790 1800 1810 

CA-AATCCTTTCCCTG-TTTCG — TGACACTCCACCCCTTGTGGA-CAC-TTGAGTGTCATCCTTGCGCCG 

II ill III II till I II I llll III III II I I II 
CACAATATGTAACAAGAAATAGAAATGACTTTACATTATTATTGGAGCACAATGA-TTTATTATTAAGGACA 


1370 1380 

1390 

1400 

1410 

1420 

1430 

1820 

1830 

1840 

1850 

1860 

1870 


GA AGGTC-AGGTGGT — ACCCGTCTGTAGGGGCGGGGAGAC-AGAGCCGCGGGGGAGCTACGAGAA 

II I II II I I I I I II II I III II I I I I INI 

GAGATAATATATCGAGATTTTTTAAAACATTTTTTTAAGC-ATAAAACA — CCTC-ATGTACATTTGAAAA 
1440 1450 1460 1470 1480 1490 1500 

1880 1890 1900 1910 1920 1930 1940 

TCGACTCACAGGGCGCCCCGGGCTTCGCAAATGAAACTTTTTTAATCTCACAAGTTTCG — TCCG-GGCTC 

ii ii i iii ill i i ii i in mu i i ii i i 

AAAACAATCATG ATAATGCATACCAATT — AGTTCCGTGGTCT--CAAGTATTGTTTTCTGAATTTA 

1510 1520 1530 1540 1550 1560 

1950 1960 1970 1980 1990 2000 

GGCGGACCTATGGC— GTCG-ATCCTTATTACCTTATC — CTGGCGCCAAGAT — AAAACAACC AA 

i mini i in n i i n i in ii n in i n i i 

ATAGTACCTATGACTTTTCGAATTTTAAATATATGATCATATTGTTTCATGATTCTTATCATGCTTTTGTAG 
1570 1580 1590 1600 1610 1620 1630 

2010 2020 2030 2040 2050 2060 2070 

A-AGCCTTGACTCCGGTACTAATTCTCCCTGCCGGCCCCCGTAAGCATAACGCGGCGATCTCCACTTTAAGA 

I llll I I III I I I I II llll II I III II I I 
ATTATTTTGAAGGAGATGATAAAT-TAAATCAAGTATTGAATGATAATAAAGAGAATAAGACCA — TTGATA 
1640 1650 1660 1670 1680 1690 1700 

2080 2090 2100 2110. 2120 2130 

ACCTGGCCGCGTTCTGCCTGGTCTCGCTTTCGTAAA-CGGTTCT — TACAAAA-GTA — ATTAGTTCTTGC 

in n i i i i n mi in i n nil in i n n i i 

ATTTAGAT AAATTTT — TTAAT — GAGTTATTAAATTTGTTTTTAATAGAAAATGTAGGAATA-TTATCGA 
1710 1720 1730 1740 1750 1760 1770 

2140 2150 2160 2170 2180 2190 2200 2210 

TTTCAGCCTCCAAGCTTCTGCTAGT-CTATGGCAGCATCAAGGCTGGTATTTGCTACGGCTGACCGCTACGC 

II II II II III II I II I I III I III III I II 

CATTGG — AAGAATATTTTGTAAGTACTGTAAAACAAGTTGGACGAGTACT — TTCAGATGATCATGACAT 


1780 

1790 

1800 

1810 

1820 

1830 


2220 

2230 

2240 

2250 

2260 

2270 

2280 


CGCCGCAATAAGGGTACTGGGCGGCCCGTCGAAGGCCCTTTGGTTTCAGAAACCCAAG-GCCCCCCTCATAC 

i i i in n n in in i inn in n nil 

TGATTCGA-AAGAATA TACGGAGAA — TTATTTTACTCCAGAAGAAGAAGAGCAAGCTTTAAAA 

1840 1850 1860 1870 1880 1890 1900 

2290 2300 2310 2320 2330 2340 

CAACGTTTCGACTTTGATTCTTGCCGGTACGTGGTGGTGGGTGCCTTA-GCTCTT — TCTCGATAGTTA — 

i i n in in i mi i i in i i in i i i i i in in 

GATTTTAAAGAAGATGAGGCTT — CTGTACATCAT-TTGGAT-CATTACGATAATAATATGGATCCTTATTA 
1910 1920 1930 1940 1950 1960 


2350 

-GAC 

II 

TGAATATCAAGGCGAATTTGGTAGTTATGAAGAAGAAGAGGATGAACTCAGAAA 
1970 X 1980 1990 2000 2010 2020 


9. ELLIS-012-FIG2AB.SEQ (1-2350) 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 


TITLE 

JOURNAL 

STANDARD 

COMMENT 

FEATURES 

r-RNA 

BASE COUNT 
ORIGIN 


ACLRGNAL 1508 bp ds-DNA BCT 15-HAR-1990 

A.laidlauii IBS ribosonal RNA snail subunit gene. 

M23932 

IBS ribosonal RNA; ribosonal RNA snail subunit. 

A.laidlauii (strain JA1) DNA. 

Acholeplasna laidlauii 

Prokaryotae! Tenericutes! Mollicutes! Mycoplasnas; Mycoplasnatales’, 
Acholeplasnataceae. 

1 (bases 1 to 1508) 

Weisburg,W.G. , Tully,J.G., Rose,D.L., Petzel,J.P., Qyaizu,H., 
YangiD.r Mandelco,L., Sechrest,J., LaurenceiT.G. , van Etten,J.L.» 
Maniloff.J. and WoeseiC.R. 

A phylogenetic analysis oF the nycoplasnas: Basis for their 
classification 

J. Bacteriol . 171, 6455-6467 (1989) 
full autonatic 

Draft entry and conputer-readable sequence 113 kindly subflitted by 
C.R.lJoese, 19-JAN-1989. 

Location/Qualifiers 

1..1480 

/note="16S ribosoflal RNA snail subunit (3' end approx.)” 
433 a 303 c 433 g 339 t 


Initial Score = 
Residue Identity = 
Gaps = 


146 Qptinized Score = 568 Significance 

467. Matches = 679 Mi snatches 

161 Conservative Substitutions 


7.53 

633 

0 


910 920 930 940 950 960 970 

TGAGCTGTGATGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAGAAGCACTAGGACCCCACCATCCT-GT 

II I I I I 

TTTATATGGAGAGTTTGATCCT 
X 10 20 


980 990 1000 1010 1020 1030 1040 

GGAACAGCACAAGCAACCCCACCACCCTGTTCTTACACAT-CATCCTAGATGATGTGTGGGCGCGCACCTCA 

it in i in i i i ii ii i mi ii i hi i ii it 

GGCTCAG — GATGAACGCTGGCGGCGTG-CCTAATACATGCAAGTCGAACGAAGCATCTTCGGATGCTTAG 
30 40 50 60 70 80 90 


1050 1060 1070 1080 1090 1100 1110 

T-CCAAGTCTCTTCTAACGCTAACAT-ATTTGTCTTTACCTTTTTTAAATCTTTTTTTAAATTTAAATTTT 

i i ii mi i i n i i inn n i i n n i i i 1 

TGGCGAACGGGTGAGTAACACGTAGATAACCTACCTTTAACTCGAGGATAACTCCGGGAAACTGGAGCTAAT 
100 110 120 130 140 150 160 

1120 1130 1140 1150 1160 1170 1180 

ATGTGTGTGAGTGTTTTGCCTGCCTG-TATGCACACGTGTGTGTGTGTGTGTGTGTGACA — CTC-CTGAT 

i n i n i i n in n i mi ii i i i n i ii n i 

A-CTG-GATAG-GATGTG — TGCATGAAAAAAACACATTTAAAGATTTATCGGTTTAAGAGGGGTCTGCGGC 
170 180 190 200 210 220 


1190 1200 1210 1220 1230 1240 1250 

GCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTTCCATAAGAACTGGAGTTATGGATGG-CTGTGA-GCCGGNN 
II I II II 1 I I III I I II 1111 II I II II III II I I 

GCATTAGTTAGTTGGTGGGGTAAGAG — CCTACC — AAGACGATGAATCGTAGCCGGACTGAGAGGTCTACC 


230 240 250 

260 

270 

280 

290 

1260 1270 1280 

1290 

1300 

1310 

1320 


NGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAACGTGACTGTATAATAAAAAAAAAATGATATTTCGG 

i i i inn mi i n i i i in n i i i i in mm 

GGCCACATTGGGACTGAGAACGGCC— CAAACTCCTACGGGA-GGCA— GCAGTAGGGAAT TTTCGG 

300 310 320 330 340 350 


GAATTGTAGAGATTGTCCTGAC — ACC — CTTCTAGTTAATGATCTAAGAGGAATTGTTGATACGTAGTATA 

in i ii i mill ii i i ii n ii i i n i i i i ii m i 

CAATGGGGGAAA — CCCTGACCGAGCAACGCCGCGTGAACGA-CGA-- AGTACT-- TCGGTATGTAAAGTT 
360 370 380 390 400 410 420 

1400 1410 1420 1430 1440 1450 1460 

C-TGTATATGTGT ATGTATAT GTATATGTAT AT ATAAGACTCTTTTACT— GTCAAAGTCAACCTAGAGTGTC 

i i mm i i i i i ii i ii i ii i i i i i i i i m m i 

CTTTTATATG-GGAAGAAAAATTAAA— AATTGACGGTACCATATGAATAAGCCCCGGCTAACTA-TGTGCC 
430 440 450 460 470 480 490 

1470 1480 1490 1500 1510 1520 1530 

TGGTTACCAGGTCAATTTTATTGGACATTTTACGTCACACACACACAC — ACACACA-CACACACACGT — 

I I III I II II I III I I I II III I I 11 
AGCAGCCGCGGTAATACATAGGGGGC — GAGCGTTATCCGGATTTACTGGGCGTAAAGGGTGCGTAGGTGG 
500 510 520 530 540 550 560 

1540 1550 1560 1570 1580 1590 1600 

TTATACTACGTACTGTTATCGGTATTCTACGTCATATAATG-GGATAGGGTAAAAGGAAACCA-AAGAGTGA 

mu i i ii i i ii i i i i ini ii i mi i i i imm 

TTATAAAAGTTTGTGGTGTAAGTGCAGTGCTTAACGCTGTGAGGCTATG— AAAACTATATAACTAGAGTGA 


570 

580 

590 

600 

610 

620 

630 

1610 

1620 

1630 

1640 

1650 

1660 



GTGATATTATTGTGGAGGTGACAGACTACCCCT-- TCTGGGT— ACGTAGGGACAGACCTCCTTCGGACTGT 

i i i mu i i ii i i ii ii i ii m ii i ii i ii 

GACAGAGGCAAGTGGAATTCCATGTGTAGCGGTAAAATGCGTAAATATATGGA-GGAACACC AGTGG 

640 650 660 670 680 690 

1670 1680 1690 1700 1710 1720 1730 

CTAAAAC-TCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGAAGAGGACAGAGGAG-ACACAGTCCGAAAAGT 

I II 1 I I 1 I I I I 1 I lllll II I 1111 1 INI II I 

CGAAGGCGGCTTGCTGGGTCTATACTGACACTGATGCACGAA-AG-CGTGGGGAGCAAACAG— GA T 

700 710 720 730 740 750 

1740 1750 1760 1770 1780 1790 1800 1810 

TATTTTTCCGGCAAATCCTTTCCCTGTTTCGTGACACTCCACCCCTTGTGGACACTTGAGTGTCATCCTTGC 

II I II I I III II I III I II III I I I I II Mil III 

TAGATACCCTGGTAGTCCACGCCGTAAACGATGAGA ACTAAGTGTTGGC-CATAAG-GTCAGTGCTGC 


770 

780 

790 

800 

810 

820 

1820 

1830 

1840 

1850 

1860 

1870 


GCCGGAAG-GTCAGGTGGTACCCGTCTGTAG— GGGCGGGGAGACAGAGCCGCGGGGGAGCT-ACGAGAATC 

II I I II II I I 1111 I I II II I I III I III II I 

AGTTAACGCATTAAGTTCTCCGCCTGAGTAGTACGTACGCAAGTATGAAACTCAAAGGAATTGACGGGACCC 

830 840 850 860 870 880 890 

1880 1890 1900 1910 1920 1930 1940 

GACTCACAGGGCGCCCCGGGCTTCGCAAAT-GAA — ACTTTTTTAATCTCA-CAAGTTTCGTC CGG 

I II II I III II I III II II II I II II I I I I I 
CGCACAAGCGGTGGATCATGTTGTTTAATTCGAAGATACACGAAAAACCTTACCAGGTCTTGACATACTCTG 
900 910 920 930 940 950 960 

1950 1960 1970 1980 1990 2000 2010 

GCTCGGCGGACCTATGGCGTC-GATCCTTA-TTACCTTATCCTGGCGCCAAGATAAAACAACCAAAAGCCTT 

III I II II II II I I I III II I I II III 

CAAAGGCTTAGAAATAAGTTCGGAGGCTAACAGATGTACAGGTGGTGCACGGTTGTCGTCAGCTCGTGTCGT 

970 980 990 1000 1010 1020 1030 1040 

2020 2030 2040 2050 2060 2070 

GA — CTCCGGTACTAATTCTCCCTGC — CGGCCCCCGTAAGCATAACGCGGCGATC-TCCACTTTAAGAA 

II I III III II I I I II III II II I I III I I II II 

GAGATGTTGGGT-- TAAGTCCCGCAACGAGCGCAACCCTTATTGCTA— GTTACCATCATTAAGTTGGGGAC 
1050 1060 1070 1080 1090 1100 


CCT GGCCGCGTTCTGCCT GGTCTCGCTT-TCGT AA-ACGGTTCTT AC-AAAAGTAATTA-GTTCTTGCTTTC 

II Ml I Mil I I II I II II I II II I II I I I II I 

TCTAG-CGAG-ACTGCCAGTGATAAATTGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGAC 

1110 1120 1130 1140 1150 1160 1170 

2150 2160 2170 2180 2190 2200 2210 

--AGCCTCCAAGCTTCTGCTAGTCTATGGCAGCATCAAGGCTGGTATTTGCTACGGCTGACC-GCTACGCCG 

I II III I II II I lllll I I III I II I II II Hill I II 

CTGGGCTACAAAC— GTGATA--CAATGGCTGGAACAAAG--AGAA-GCGATA-GGGTGACCTGGAGCGAAA 
1180 1190 1200 1210 1220 1230 1240 

2220 2230 2240 2250 2260 2270 2280 

C-CGCAATAAGGGT-ACTGGGCGGCCCGTCGAAGGCCCTTTGGTTTCAGAAACCCAAGGCCCCCCTCATACC 

I I III II II I I III II II III II II I II I II II 

CTCACAAAAACAGTCTCAGTTCGGATTGGAGTCTGCAACTCGACTCCATGAAGTC— GGAATCGCTAGTA— 

1250 1260 1270 1280 1290 1300 1310 

2290 2300 2310 2320 2330 2340 

AACGTTTCGACTTTGATTCTTGCCG — GTACG — TGGTGGTGGGTGC-CTTAGCTCTT TCTCGATA 

III lllll INI I Ml I I III II I I II I I I III I 

ATCG — CAAATCAGCATGTTGCGGTGAATACGTTCTCGGGGTTTGTACACAGCGCCCGTCAAACCACGAAA 
1320 1330 1340 1350 1360 1370 1380 

X 

GT TAGAC 

II I II 

GTGGGCAATACCCAACGCCGGTGGCCTAACCCGAAAGGGAGGGAGCCGTCTAAGGTAGGGT 
1390 1400 1410 1420 1430 1440 


10. ELLIS-012-F1G2AB.SEQ (1-2350) 

HUMBIND Hunan binding protein nRNA> partial cds. 

LOCUS HUMBIND 3523 bp ss-nRNA PRI 16-JUN-1993 

DEFINITION Hunan binding protein mRNA> partial cds. 

ACCESSION L19597 
KEYWORDS binding protein. 

SOURCE Homo sapiens adult brain cDNA to nRNA. 

ORGANISM Homo sapiens 

Eukaryota; Animal i ar Chordata; Vertebrata; Mannaliai Theria; 
Eutheriai Primates; Haplorhini; Catarrhini; Hominidae. 

REFERENCE 1 (bases 1 to 3523) 

AUTHORS Vostrov.A.A, > Quitschke.M.W. > SchuarznanjA.L. r Blangy r A.r CuzinrF.» 

WesleyiU.V., HagagrN.G. and Goldgaber.D. 

TITLE Cloning of a protein that binds to a recognition sequence in the 
APP promoter 

JOURNAL Unpublished (1993) 

STANDARD full automatic 
FEATURES Location/Qualifiers 

source 1..3523 

/organism=“Hono sapiens" 

/dev_stage="adult" 

/sequenced_mol="cDNA to nRNA" 

/ti ssue_type="brain“ 

CDS <69.. >2154 

/note="putative" 

/product=“binding protein" 

/codon_start=l 

/translation="TRRGHAATGTAAAAATGRLLLLLLVGLTAPALALAGYIEALAAN 
AGTGFAVAEPQIAMFCGKLNHHVNIQTGKWEPDPTGTK5CFETKEEVLQYCQEMYPEL 
QITNVMEANQRVSIDNWCRRDKKQCKSRFVTPFKCLVGEFVSDVLLVPEKCSFFHKER 
MEVCENHQHWHTVVKEACLTQGMTLYSYGMLLPCGVDQFHGTEYVCCPQTK I IGSVSK 
EEEEEDEEEEEEEDEEEDYDVYKSEFPTEADLEDFTEAAVDEDDEDEEEGEEVVEDRD 
YYYDTFKGDDYNEENPTEPGSDGTHSDKEITHDVKVPPTPLPTNDVDVYFETSADDNE 
HARFQKAKEQLEIRHRNRMDRVKKEWEEAELQAKNLPKAER9TLIQHF8ANVKALEKE 

AACCK-nni UCTUI APIICAMI MtipppMAI CkJVi A At aenppQpuptl AAI PDVUDACMV 



DRLHTIRHY QHVLAVDPEKAAQMKS9VHTHLHV IEERRNQSLSLLYKVPYVAQEIQEE 
IDELLQE0RADMDQFTASISETPVDVRVSSEESEE1PPFHPFHPFPALPENEGSGVGE 
flDGGLIGAEEKVINSKNKVDENHVIDETLDVKENIFNAERVGGLEEERESVGPLREDF 
SLSSSAL IGLLVI AVAIATVI VISLVHLRKRQYGTISHGI VEVDPMLTPEERHLNKHQ 
NHGYENPTYKYLEX" 

BASE COUNT 941 a 804 c 913 g 865 i 
ORIGIN 


Initial Score = 

145 

Optimized Score = 772 

Significance = 

7.47 

Residue Identity = 

497. 

Hatches = 959 

Mismatches = 

721 

Gaps = 

273 

Conservative Substitutions 

= 

0 


490 500 510 520 530 X 540 550 

GCAGGGTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTGGAC 


CCAAACTACGT— GCGGTGTGC 
X 10 20 


560 570 580 590 600 610 620 

GAACTGCTCTCTAGACGGAAGGTCTGTGCTTAAGA-CCG-GGACCACGGAGA-AGGACGTGGTGTGTGGACC 

II II II I II II II till III III I I I ill I II II II I II 
TAA— GC-GAGGAGTCCGAGTGTGTGAGCTTGAGAGCCGCGCGCTAGAGCGACCCGGCGAGGGATGGCGGCC 
30 40 50 60 70 80 


630 640 650 660 670 680 690 

CCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTCCTT 

ii ii ii i i iii mi i i mi i i i n in in 

ACC--GGGACCGCGGCCGCCGCAG-- CCAC--GGGCAG— GCTCCTGCTTCTGCTGCTGGTGGG-GCTCACG 
90 100 110 120 130 140 150 

700 710 720 730 740 750 760 

GCAGGTCCTTACCTTGTTCCTGGC— GCTGACATC— GGCT-TTGCTGC — TGGCCCTGATCTTCATTACT 

n i i i inn him in inn mi mi n m n n i in 

GC— GCC--TGCCTTGGCGCTGGCCGGCT-ACATCGAGGCTCTTGCAGCCAATG— CCGGAACAGGATT — 
160 170 180 190 200 210 

770 780 790 800 810 820 

CTCCTGTTCTCTGTG-CTCAAATGG — AT — CAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAG 

i inn m i mini in inn i mi n i i mi i 

-TGCTGTT-GCTGAGCCTCAAATCGCAATGTTTTGTGGGAAGTT AAATATGCATGTGAACATTCAGA 

220 230 240 250 260 270 

830 840 850 860 870 880 890 

AAGACCACTGG— AGCAGCT-CAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAAGAAGAAGGAGGA 

i i in i i i i m in i ii inn i n in iimiii I 

CTGGGAAATGGGAACCTGATCCAACAGGCA — CCAAG-AGCTG-CTTTG-AAACA-AAAGAAGAA-GTTCT 
230 290 300 310 320 330 340 


900 910 920 930 940 950 960 

GGAGGA-GGCTATGAGCTGTGAT — GTACTA-TCCTAGGAGATGTG-TGG-GCCGAAACCGAGAAGCACTA 

n i i i in in n i in ii inn m i i n i i i i ii 

TCAGTACTGTCAGGAGATGT-ATCCAGAGCTACAGATCACAAATGTGATGGAGGCAAACCAGCGGGTTAGTA 
350 360 370 380 390 400 410 

970 980 990 1000 1010 1020 1030 

-GGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACCCTGTTCTTACACATCATCCTAGATGATG 

in i n i ii in i mm ii i n mm i i i ini 

TTGACAACTGGTGCCGGAGGGACA — AAAAGCAATGCAAGAGTCGCTTTGTTACAC — CTTTCAAG-TGTCT 


420 

430 

440 

450 

460 

470 

480 

1040 

1050 

1060 

1070 

1080 

1090 

1100 


TGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACATATTTGTCTTTACCTTTTTTAAATCTTTTTT 

mu i ii i in n in i ii i n i i mi i i 

CGTGGGTGAATTTGTAAGTGATGTC-CTGCTAGTTC-CAGAAAAGTGCC — AGTTTTTCCACA 

490 500 510 520 530 540 



iilO 1120 1130 1140 1150 1160 

TAAATTTAAATTTTATGTGTGTGAGTGTTTTGC-CTG CCTGTATGC--ACACGTGTGTGTGTGTGTGT 

I I I llllll II I II III I III I III III II I I 
AAGAGCGGATGGAGGTGTGTGAGAATCACCAGCACTGGCACACGGTAGTCAAAGAGGCATGTCTGACTCAGG 
550 560 570 580 590 600 610 

1170 1180 1190 1200 1210 1220 1230 

GTGTGACACTCCTGATGCCTGAGG-AGG T CAGAAGAGAAAGGGTTG GTTCCATAAGAACTGGAG 

i mi ii i ii ii ii i i ii i i i mi i mini i mi i 

GAATGAC-CTTAT-ATAGCTACGGCATGCTGCTCCCATGTG — GGGTAGACCAGTTCCAT-GGCACTGAA- 
620 630 640 650 660 670 

1240 1250 1260 1270 1280 1290 

TTATGGATG — GCTGTGAG-CCGGNNNGATAGG-TCGG GACGGAGACCTGTCTTCTTATTTTAA 

. mi n n i n i n n n i n i n i ii i 

-TATGTGTGCTGCCCTCAGACAAAGATTATTGGATCTGTGTCAAAAGAAGAGGAAGAGGAAGATGAAGAGGA 


680 690 

700 

710 

720 

730 

740 

1300 

1310 

1320 

1330 

1340 

1350 


CGTG— ACTGTATAATAAA AAAAAAATGATATTTCGGGAATTGTAGAGATTGTCCTGACACCCTTCTA 

I I I I I II II II I lilll III II II 1 1 I 1111 I III 

AGAGGAAGAGGAAGATGAAGAGGAAGACTATGATGTTT — ATAAAAGTGAATTTCCTACTGAAGCAGATCTG 
750 760 770 780 790 800 810 

1360 1370 1380 1390 1400 1410 1420 

GTTAATGA — TCTAAG-AGGAATTGTTGATACGTAGTATACTGTATATGTGTATGTATATGTATATGT-ATA 

I II II II II II I III III II II II III I I I I I I II I 

G — AA-GACTTCACAGAAGCAGCTGTGGAT — GAGGATGATGAGGATGAGGAAGAAGGGGAGGAAGTGGTG 


820 

830 

840 

850 

860 

870 

880 

1430 

1440 

1450 

1460 

1470 

1480 

1490 


TATAA— GACTCTTTTAC— TGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGGACATT 

I I II I III II II 1111 III II II I III III II II II 
GAGGACCGAGATTACTACTATGACACCTTCAAAGGAGA-TGACT-ACAATGAGGAGAATCCTACTGAACCCG 
890 900 910 920 930 940 950 

1500 1510 1520 1530 1540 1550 1560 

TTA-CGTC-ACACACACACACACACACACA-CACACACGTTTATACTACGTACTGTTATCGGTATTCT-ACG 

I II I III II III I I II II I I I I II I III III I 
GCAGCGACGGCACCATGTCAGACAAGGAAATTACTCATGAT--GTCAAAGTTCCTCCAAC— TCCTCTGCCA 
960 970 980 990 1000 1010 1020 

1570 1580 1590 1600 1610 1620 

TCATATAATG-GGATAGGGTAAAAGGAAACCAAAGAGTGAGTGAT-ATTATTGTG-GAGGTGACAGA — CT 

I II III III I III II III I II 1111 II I II II 1111 II 

ACCAATGATGTTGAT-GTGTATTTCGAGACCTCTG-CAGA-TGATAATGAGCATGCTCGCTTCCAGAAGGCT 

1030 1040 1050 1060 1070 1080 1090 

1630 1640 1650 1660 1670 1680 1690 

A — CCCCTTCTGGGTACGTAGGGACAG — ACCTCCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCA 

i i mi i i n n i in i nil i tin i nil i n 

AAGGAGCAGCTGGAGA-TTCGGCACCGCAACCGAAT--GGACAGGGTAAAGAAGGAATGGGAAG — AGGCA 
1100 1110 1120 1130 1140 1150 

1700 1710 1720 1730 1740 1750 1760 

-AGTTCCCGGACGAAGA GGACAGAGGAGACACAGTCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCC 

II I I II 1111 II II II I II I I I 1 III I III III 11 

GAGCTTCAAG-CTAAGAACCTCCCCAAAGCAG — AGAGGCAG-ACTCTGATT CAGCACTTCCAAGCCA 

1160 1170 1180 1190 1200 1210 1220 

1770 1780 1790 1800 1810 1820 

TGTTTCGTGACACTCCA-CCCCTTGTGGACACTTGAGTGTCATC— CTTGCG CCGGAAGGTCAG-GT 

II II I I I I I I II illl II I II I I II II I I II 
TGGTTAAAG-CTTTAGAGAAGGAAGCAGCCA-GTGAGAAGCAGCAGCTGGTGGAGACCCACCTGGCCCGAGT 
1230 1240 1250 1260 1270 1280 1290 



1830 1840 1850 1860 1870 1880 1890 

GGT ACCCGT-CTGTAGGGGCGGGGAGACAGA-GCCGCGGGGGAGCTACGAGAAT CGACTCACAGGGCGCCCC 

II I I I III I I I I I I II I I I II II INI I I I II III I III 

GGAAGCTATGCTGAATGACC — GCCGTCGGATGGCTCTGGAGAACTACCTGGCT-GCCTTGCAGTCTGACCC 
1300 1310 1320 1330 1340 1350 1360 

1900 1910 1920 1930 1940 1950 1960 

GGGCTTCG--CAAATGAAACTTTTTTAATCTCAC-AAGTT7CGTCCGGGCTCGGCGGACCTATGGCGTCGAT 

II II I II III II II III lllll III I I II I III 

— GCCACGGCCTCATCGCATTCTCCAGGCCTTACGGCG'TTATGTCCGTGCT — GAGAACAAA GAT 

1370 1380 1390 1400 1410 1420 

1970 1980 1990 2000 2010 2020 2030 

C-CTTA-TTACCTTATCCTGGCGCCAAGATAAAACAACCAAAAGC-CTTGACTCCGGTACTAATTC-TCCCT 

I till Mil III II II I I II II lllll II I I II I III 

CGCTTACATACC— ATC CGTCATTA-CCAGCATGTGTTGGCTGTTGAC-CCAGAA— AAGGCGGCCCA 

1430 1440 1450 1460 1470 1480 

2040 2050 2060 2070 2080 2090 

GCCGGCCCCCGTAAGCATAACGCGGCGATCTCCAC TTTAAGAACCTGGCCGCGTTCTGCCTGGTCTC- 

ii ii ii it i immi it iiiii i i mi mi 

GATGAAATCCCAGGTGATGACAC ATCTCCACGTGATTGAAGAAAGGAGGAAC-CAAAGCCT-CTCTCT 

1490 1500 1510 1520 1530 1540 

2100 2110 2120 2130 2140 2150 2160 

GCTTTCGTAA-- ACGGT-TCTTACAAAAGTAATT — AG — TTCTTGCTTTCAGCCTCCA— AGCTTC-TGC 

in i n n i i i i in mi n in i in n in i in 

GCTCTACAAAGTACCTTATGTAGCCCAAGAAATTCAAGAGGAAATTGATGAGCTCCTTCAGGAGCAGCGTGC 
1550 1560 1570 1580 1590 1600 1610 1620 

2170 2180 2190 2200 2210 2220 

TAGTCTATGG — CAGCATCAAGGCTGGTATTT— GCTACGGCTGACCGC-TACGCCGCCGCAATAAGGGTAC 

11 lllll III III II II I I II III I I II II I III I 
-AG-ATATGGACCAG-TTCACTGCCTCAATCTCAGAGACCCCTGTGGACGTCCGGGTGAGCTCTGAGGAGAG 
1630 1640 1650 1660 1670 1680 1690 

2230 2240 2250 2260 2270 2280 2290 

TG-GGCGGCCCGTCGAAGGCCCTTTGGTTTCA-GAAACCCAAGGCCCCCCTCA-TACCAACGTT-TCGACTT 

II II I II I I II II II II II II III I II II I I I II I 

TGAGGAGATCC — CACCGTTCCACCCCTTCCACCCCTTCCCAGCCCTACCTGAGAACGAAGGATCTGGAGTG 

1700 1710 1720 1730 1740 1750 1760 

2300 2310 2320 2330 2340 X 

TGATTCTTGCCGGTACGTGG — TGGTGGGTGCC — TTAGCTCTTTCTCGATAGTTAGAC 

n n n i n n i linn n mi in in 

GGA — GAGCAGGATGGGGGACTGATCGGTGCCGAAGAGAAAGTGATTAACAGTAAGAATAAAGTGGATGAA 
1770 1780 1790 1800 1810 1820 

AACATGGTCATTGACGAGACTCTGGATGTTAAGGAAA 
1830 1840 1850 1860 


11. ELLIS-012-FIG2AB.SEG (1-2350) 

HSBIND Hunan binding protein nRNA> partial cds. 

ID HSBIND standard; RNA; PRI; 3523 BP. 

XX 

AC L19597; 

XX 

DT 18-JUN-1993 (Rel. 36, Created) 

DT 18-JUN-1993 (Rel. 36, Last updated, Version 1) 

XX 

DE Hunan binding protein nRNA, partial cds. 

XX 

KW binding protein. 



OS 

Homo sapiens 

(human) 

OC 

Eukaryota; Aninalia! Metazoa! Chordata! Vertebrata! Mammalia! 

OC 

Theria! Eutheria! Primates! Haplorhini! Catarrhini! Hominidae. 

XX 



RN 

Ill 


RP 

1-3523 


RA 

Vostrov A. A. > 

Quitschke W.W.r Schwarzman A.L.r Blangy A.» 

RA 

Cuzin F.i Wesley U.V.i Hagag N.G.i Goldgaber D.! 

RT 

"Cloning of a 

protein that binds to a recognition sequence in 

RT 

APP promoter" 

r 

RL 

Unpublished. 


XX 



FH 

Key 

Location/Qualifiers 

FH 



FT 

CDS 

<69.. >2154 

FT 


7product="binding protein" 

FT 


/note="putative" 

FT 


/codon_start=l 

FT 

source 

1 . .3523 

FT 


/organisn=“Homo sapiens" 

FT 


/dev_stage="adult" 

FT 


/sequenced_mol=“cDNA to mRNA" 

FT 


/tissue_type="brain" 

XX 



SO 

Sequence 3523 

BP; 941 A! 804 C! 913 G; 865 T; 0 other! 


Initial Score = 145 Optimized Score = 772 Significance = 7.47 

Residue Identity = 497. Matches = 959 Mismatches = 721 

Gaps = 273 Conservative Substitutions = 0 

490 500 510 520 530 X 540 550 

GCAGGGTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTGGAC 

I I II I I! I 

CCAAACTACGT— GCGGTGTGC 
X 10 20 


560 570 580 590 600 610 620 

GAACTGCTCTCTAGACGGAAGGTCTGTGCTTAAGA-CCG-GGACCACGGAGA-AGGACGTGGTGTGTGGACC 

II II II I II II II till III III I I I I II I II II II I II 
TAA — GC-GAGGAGTCCGAGTGTGTGAGCTTGAGAGCCGCGCGCTAGAGCGACCCGGCGAGGGATGGCGGCC 
30 40 50 60 70 80 


630 640 650 660 670 680 690 

CCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTCCTT 

ii it ii i i iii mi i i mi i i i n in in 

ACC— GGGACCGCGGCCGCCGCAG— CCAC— GGGCAG— GCTCCTGCTTCTGCTGCTGGTGGG-GCTCACG 
90 100 110 120 130 140 150 

700 710 720 730 740 750 760 

GCAGGTCCTTACCTTGTTCCTGGC — GCTGACATC— GGCT-TTGCTGC — TGGCCCTGATCTTCATTACT 

n i i i inn inn in inn nil nil n n n n i in 

GC— GCC— TGCCTTGGCGCTGGCCGGCT-ACATCGAGGCTCTTGCAGCCAATG— CCGGAACAGGATT — 
160 170 180 190 200 210 

770 780 790 800 810 820 

CTCCTGTTCTCTGTG-CTCAAATGG— AT — CAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAG 

i inn in i mini i n i n n i nil n i i nil i 

-TGCTGTT-GCTGAGCCTCAAATCGCAATGTTTTGTGGGAAGTT AAATATGCATGTGAACATTCAGA 

220 230 240 250 260 270 

830 840 850 860 870 880 890 

AAGACCACTGG — AGCAGCT-CAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAAGAAGAAGGAGGA 

i i in i i i i in mi ii inn i n in nnnn i 

CTGGGAAATGGGAACCTGATCCAACAGGCA — CCAAG-AGCTG-CTTTG-AAACA-AAAGAAGAA-GTTCT 
280 290 300 310 320 330 340 



900 910 920 930 940 950 960 

GGAGGA-GGCTATGAGCTGTGAT — GTACTA-TCCTAGGAGATGTG-TGG-GCCGAAACCGAGAAGCACTA 

II I I I III III II I III II lllll III I I II I I I I II 
TCAGTACTGTCAGGAGATGT-ATCCAGAGCTACAGATCACAAATGTGATGGAGGCAAACCAGCGGGTTAGTA 
350 360 370 380 390 400 410 

970 980 990 1000 1010 1020 1030 

-GGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACCCTGTTCTTACACATCATCCTAGATGATG 

iii i ii i ii iii i mm ii i ii mm i n n u 

TTGACAACTGGTGCCGGAGGGACA-- AAAAGCAATGCAAGAGTCGCTTTGTTACAC--CTTTCAAG-TGTCT 


420 

430 

440 

450 

460 

470 

480 

1040 

1050 

1060 

1070 

1080 

1090 

1100 


TGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACATATTTGTCTTTACCTTTTTTAAATCTTTTTT 

mu i ii i iii ii hi i i i i ii i i mi i i 

CGTGGGTGAATTTGTAAGTGATGTC-CTGCTAGTTC-CAGAAAAGTGCC — AGTTTTTCCACA 


490 

500 

510 

520 

530 

540 

1110 

1120 

1130 

1140 

1150 

1160 


TAAATTTAAATTTTATGTGTGTGAGTGTTTTGC-CTG CCTGTATGC — ACACGTGTGTGTGTGTGTGT 

i i i mm ii i ii m i iii i in m it i i 

AAGAGCGGATGGAGGTGTGTGAGAATCACCAGCACTGGCACACGGTAGTCAAAGAGGCATGTCTGACTCAGG 
550 560 570 580 590 600 610 

1170 1180 1190 1200 1210 1220 1230 

GTGTGACACTCCTGATGCCTGAGG-AGG T CAGAAGAGAAAGGGTTG GTTCCAT AAG AACTGGAG 

i mi ii i ii ii ii i i n i i i mi i mini i mi i 

GAATGAC-CTTAT-ATAGCTACGGCATGCTGCTCCCATGTG — GGGTAGACCAGTTCCAT-GGCACTGAA- 
620 630 640 650 660 670 

1240 1250 1260 1270 1280 1290 

TTATGGATG — GCTGTGAG-CCGGNNNGATAGG-TCGG GACGGAGACCTGTCTTCTTATTTTAA 

INI II II I II I INI II I II I II I II I 

-TATGTGTGCTGCCCTCAGACAAAGATTATTGGATCTGTGTCAAAAGAAGAGGAAGAGGAAGATGAAGAGGA 


680 690 

700 

710 

720 

730 

740 

1300 

1310 

1320 

1330 

1340 

1350 


CGTG — ACTGTATAATAAA AAAAAAATGATATTTCGGGAATTGTAGAGATTGTCCTGACACCCTTCTA 

i i i i i ii ii ii i mu iii ii n i i i mi i in 

AGAGGAAG'AGGAAGATGAAGAGGAAGACTATGATGTTT — ATAAAAGTGAATTTCCTACTGAAGCAGATCTG 
750 760 770 780 790 800 810 

1360 1370 1380 1390 1400 1410 1420 

GTTAATGA — TCTAAG-AGGAATTGTTGATACGTAGTATACTGTATATGTGTATGTATATGTATATGT-ATA 

I II II II II II I III III II II II III I I I I I I II I 

G — AA-GACTTCACAGAAGCAGCTGTGGAT — GAGGATGATGAGGATGAGGAAGAAGGGGAGGAAGTGGTG 


820 

830 

840 

850 

860 

870 

880 

1430 

1440 

1450 

1460 

1470 

1480 

1490 


TATAA — GACTCTTTTAC— TGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGGACATT 

i i n i in n n mi in n n i in in n n n 

GAGGACCGAGATTACTACTATGACACCTTCAAAGGAGA-TGACT-ACAATGAGGAGAATCCTACTGAACCCG 
890 900 910 920 930 940 950 

1500 1510 1520 1530 1540 1550 1560 

TTA-CGTC-ACACACACACACACACACACA-CACACACGTTTATACTACGTACTGTTATCGGTATTCT-ACG 

I II I III II III I I II II I I 1 I II I III III I 
GCAGCGACGGCACCATGTCAGACAAGGAAATTACTCATGAT — GTCAAAGTTCCTCCAAC— TCCTCTGCCA 
960 970 980 990 1000 1010 1020 

1570 1580 1590 1600 1610 1620 

TCATATAATG-GGATAGGGTAAAAGGAAACCAAAGAGTGAGTGAT-ATTATTGTG-GAGGTGACAGA — CT 

I II III III I III II III I II till II I II II 1111 II 
ACCAATGATGTTGAT-GTGTATTTCGAGACCTCTG-CAGA-TGATAATGAGCATGCTCGCTTCCAGAAGGCT 
1030 1040 1050 1060 1070 1080 1090 



1630 1640 1650 1660 1670 1680 1690 

A — CCCCTTCTGGGTACGTAGGGACAG — ACCT CCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCA 

i i mi i I i! ii i iii i mi i mi i mi i ii 

AAGGAGCAGCTGGAGA-TTCGGCACCGCAACCGAAT — GGACAGGGTAAAGAAGGAATGGGAAG — AGGCA 
1100 1110 1120 1130 1140 1150 

1700 1710 1720 1730 1740 1750 1760 

-AGTTCCCGGACGAAGA GGACAGAGGAGACACAGTCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCC 

III 1 II llll II II II 1 II I I I I III I III III II 

GAGCTTCAAG-CTAAGAACCTCCCCAAAGCAG— AGAGGCAG-ACTCTGATT CAGCACTTCCAAGCCA 

1160 1170 1180 1190 1200 1210 1220 

1770 1780 1790 1800 1810 1820 

TGTTTCGTGACACTCCA-CCCCTTGTGGACACTTGAGTGTCATC — CTTGCG CCGGAAGGTCAG-GT 

ii ii iiii iiii mi ii i ii ii ii ii ii ii 

TGGTTAAAG-CTTTAGAGAAGGAAGCAGCCA-GTGAGAAGCAGCAGCTGGTGGAGACCCACCTGGCCCGAGT 
1230 1240 1250 1260 1270 1280 1290 

1830 1840 1850 1860 1870 1880 1890 

GGTACCCGT-CTGTAGGGGCGGGGAGACAGA-GCCGCGGGGGAGCTACGAGAATCGACTCACAGGGCGCCCC 

III I I III! I I I I I II I 1 I II II IIII I II 11 III 1 III 

GGAAGCTATGCTGAATGACC— GCCGTCGGATGGCTCTGGAGAACTACCTGGCT-GCCTTGCAGTCTGACCC 
1300 1310 1320 1330 1340 1350 1360 

1900 1910 1920 1930 1940 1950 1960 

GGGCTTCG— CAAATGAAACTTTTTTAATCTCAC-AAGTTTCGTCCGGGCTCGGCGGACCTATGGCGTCGAT 

II II I II III II II III IIII! II! II II I III 

“GCCACGGCCTCATCGCATTCTCCAGGCCTTACGGCGTTATGTCCGTGCT— GAGAACAAA GAT 

1370 1330 1390 1400 1410 1420 

1970 1980 1990 2000 2010 2020 2030 

C-CTTA-TTACCTTATCCTGGCGCCAAGATAAAACAACCAAAAGC-CTTGACTCCGGTACTAATTC-TCCCT 

I IIII III! Ill II II I I II II Hill IIII II I III 

CGCTTACATACC— ATC CGTCATTA-CCAGCATGTGTTGGCTGTTGAC-CCAGAA — AAGGCGGCCCA 

1430 1440 1450 1460 1470 1480 

2040 2050 2060 2070 2080 2090 

GCCGGCCCCCGTAAGCATAACGCGGCGATCTCCAC TTTAAGAACCTGGCCGCGTTCTGCCTGGTCTC- 

II II HIM 1 1 1 H 1 1 1 II Hill I I III! IIII 

GATGAAATCCCAGGTGATGACAC ATCTCCACGTGATTGAAGAAAGGAGGAAC-CAAAGCCT-CTCTCT 

1490 1500 1510 1520 1530 1540 

2100 2110 2120 2130 2140 2150 2160 

GCTTTCGTAA — ACGGT-TCTTACAAAAGTAATT— AG — TTCTTGCTTTCAGCCTCCA — AGCTTC-TGC 

Hi i ii ii ii i i in mi ii m i m ii m i hi 

GCTCTACAAAGTACCTTATGTAGCCCAAGAAATTCAAGAGGAAATTGATGAGCTCCTTCAGGAGCAGCGTGC 
1550 1560 1570 1580 1590 1600 1610 1620 

2170 2180 2190 2200 2210 2220 

TAGTCTATGG— CAGCATCAAGGCTGGTATTT — GCTACGGCTGACCGC-TACGCCGCCGCAATAAGGGTAC 

ii mu hi hi n ii i i it m i i ii ii i in i 

-AG-ATATGGACCAG-TTCACTGCCTCAATCTCAGAGACCCCTGTGGACGTCCGGGTGAGCTCTGAGGAGAG 
1630 1640 1650 1660 1670 1680 1690 

2230 2240 2250 2260 2270 2280 2290 

TG-GGCGGCCCGTCGAAGGCCCTTTGGTTTCA-GAAACCCAAGGCCCCCCTCA-TACCAACGTT-TCGACTT 

II II I II I I II II II I! II II III I II II I I I II I 

TGAGGAGATCC — CACCGTTCCACCCCTTCCACCCCTTCCCAGCCCTACCTGAGAACGAAGGATCTGGAGTG 
1700 1710 1720 1730 1740 1750 1760 

2300 2310 2320 2330 2340 X 

TGATTCTTGCCGGTACGTGG — TGGTGGGTGCC — TTAGCTCTTTCTCGATAGTTAGAC 

ii ii it i h ii i mm ii mi m hi 

GGA — GAGCAGGATGGGGGACTGATCGGTGCCGAAGAGAAAGTGATTAACAGTAAGAATAAAGTGGATGAA 
1770 1730 1790 1800 1810 1820 



AACATGGTCATTGACGAGACTCTGGATGTTAAGGAAA 
1830 1840 1850 1860 


12. ELLIS-012-FIG2AB.SE6 (1-2350) 

HSHB15RNA Hono sapiens nRNA for HB15 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 


JOURNAL 

STANDARD 

REFERENCE 

AUTHORS 

TITLE 

JOURNAL 


STANDARD 

COMMENT 


HSHB15RNA 1761 bp RNA PRI 31-JUL-1992 

Horo sapiens rRNA for HB15 
Z 1 1 697 

HB15 gene; innunoglobulin superfanily. 
hunan 

Horo sapiens 

Eukaryota; Animal i a; Metazoa; Chordata; Vertebrata; Mannalia; 
Theria! Eutheria; Prinates; Haplorhini; Catarrhini; Honinidae. 

1 (bases 1 to 1761) 

ZhouiL.i Schuarting.R. > Snith,H.M. and TedderiT.F. 

A novel cell-surface flolecule expressed by hunan interdigitating 
reticulun cells. Langerhans cells and activated lynphocytes that is 
a neu nenber of the innunoglobulin superfanily 
J. Innunol . 149, 735-742 (1992) 
full autonatic 

2 (bases 1 to 1761) 

Tedder, T.F. 

Direct Subnission 

Subnitted ( i 1— FEB-1992) T.F. Tedder, Division of Tunor Innunology, 
Dana-Farber Cancer Institute/Harvard Medical School, 44 Binney St., 
Boston, HA, 02115-6084, USA 
full autonatic 

^source: tissue=hunan tonsil; 


^source: cel l_type=lynphocyte; 

^source: clone_library=cDNA library in lanbda gt-11; ^source! 
clone=pHB15; 

ssource,' is_nacronuclear=N; 

*source: is_proviral=N; 

* 50 urce: is_gernline=N. 

FEATURES Location/Qualifiers 

sig_peptide 11.. 67 

nat_peptide 68.. 625 


/product="HB15" 

/note="proposed anino-terninus of nature protein product" 
polyA_signal 1248.. 1253 

CDS " 11.. 628 


/ evidence=EXPER IMENTAL 
/note=“a cell-surface nolecule expressed by 
interdigitating reticulun cells, Langerhans cells and 
activated lynphocytes. A nenber of the innunoglobulin 
superfanily" 

/product=“HB15 n 

/codon_start=l 

/translation="MSRGLQLLLLSCAYSLAPATPEVKVACSEDVDLPCTAPHDPQVP 
YTVSWVKLLEGGEERMETPSEDHLRGQHYHQKGQNGSFDAPNERPYSLKIRNTTSCNS 
GTYRCTLQDPDGSRNLSGKVILRVTGCPASRKEETFKKYRAEIVLLLALVIFYLTLI I 
FTCKFARLQSIFPDFSKAGMERAFLPVTSPNKHLGLVTPHKTELV" 

BASE COUNT 453 a 399 c 442 g 467 t 
ORIGIN 


Initial 

Score = 

143 

Optinized Score = 697 

Significance = 

Residue 

Identity = 

477. 

Matches = 843 

Misnatches = 

Gaps 

= 207 

Conservative Substitutions 

= 


7.33 

719 

0 


X 10 20 

ATGTCC-ATGAACTGCTGAGTG 


TGAGCTGCGCCTACAGCCTGGCTCCCGCGACGCCGGAGGTGAAGGTGGCTTGCTCCGAAGATGTGGACTTGC 


30 40 50 60 70 80 

GATAAACAGCAC — GGGATATCTCTGT — CTA-AAGGAATATTACTACACCAGGAAAAGGACACATTCGA 

I II II I Mill II III I II I I II II I III II 
CCTGCACCGCCCCCTGGGATCCGCAGGTTCCCTACACGGTCTCCTGGGTCA— AGTTATTGGAGGGTGGTGA 
120 130 140 150 160 170 180 


90 100 110 120 130 140 150 

CAACAGGAAAGGAGCCTGTCACAGAAAACCA — CAGTGTCCTGTGCATGTGACATTTCGCCATGGGAAACA 

i iii i mi i i in mi in ii in i in i i in n i 

AGAGAGG-ATGGAGACACCCCAGGAAGACCACCTCAGGGGAC — AGCA-CTATCAT — CAGAAGGGGCAA-A 
190 200 210 220 230 240 

160 170 180 190 200 210 220 

ACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTG-TGA — GAA — GGTGGGAGCCGTGCAGA 

I III II I II II II III I III II III 1 III III I 

ATGGTTCTTTCGACGCCCCCAATGAAAGGC-CCTA-TTCCCTGAAGATCCGAAACACTACCAGC — TGC — A 

250 260 270 280 290 300 310 

230 240 250 260 270 280 

ACTC-CTGTGATAACTGT-CAGCCTG — GTAC — TTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGC 

mi i in n n in i n i inn n i i in in n n 

ACTCGGGGACATACAGGTGCACTCTGCAGGACCCGGATGGGCAGAGAAACCTA — AGT-GGCA— AGGTGA 
320 330 340 350 360 370 

290 300 310 320 330 340 350 

CCTCCA-AGT-AC — CTTCTCCAGCATAG-GTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGG-- 

n i in n i n in n n i i n I m i i inn in i 

TCTTGAGAGTGACAGGATGCCCTGCACAGCGTAAAGAAGAGACTTTTAAGAAATACAGAGCG-GAGATTGTC 
380 390 400 410 420 430 440 

360 370 380 390 400 410 420 

CTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACA— ACGCGGAGT-GTGAGTGCATTGAAGGATTCCA 

II I I II II I till II II II I II I II II III III I I I II 

CTGCTGCTGGCTCTGGTTATTTT-CTACTTAACACTCATCATTTTCACTTGTAAGT — TTGCACGGCTACA 
450 460 470 480 490 500 510 

430 440 450 460 470 480 

TTGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCAGGCCT — GGCCAG-GAGCT AACGAAGC 

ii in i i i i i n i in i n i mi i n n nn 

GAGTATCTTCCGAGATTTTTCTAAAGCTGGCATGGAACG-AGCTTTTCTCCCAGTTACCTCCCCAAATAAGC 
520 530 540 550 560 570 580 


490 500 510 520 530 540 550 

AGGG-TTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTG 

nil i i i in nil i nn m i i in i n nn 

ATTTAGGGCTAGTGACTCCTCACAAGACAGAAC-TGGTATGAGCAGGA— TTTCTGCAGGTTCTTCTTCCTG 
590 600 610 620 630 640 650 


560 570 580 590 600 610 620 

GACGAACTGCTCTCTAGACG'GAAGGTCTGTGCTTAAGACCGGGACCACGGAGAAGGA-CGT — GGTG-TGTG 

i nil n n i nn i i n n i nn n i i in i 

AAGCTGAGGCTC — AG-GGGTGTGCCTGTCTGTTACACTGGAGGAGAGAAGAATGAGCCTACGCTGAAGAT 
660 670 680 690 700 710 720 

630 640 650 660 670 680 690 

GACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTT — CTGTGACTCCAGAGGGAGGACCAGGAGGGCA 


GGCATCCTGTGAAGTCCTTCAC-CTCACTGAAAACATCTGGAAGGGGATCCCACCCCATTTTC-TGTGGGCA 
730 740 750 760 770 780 790 

700 710 720 730 740 750 

CTCCTTGCAGGTCCTTACCT-TGTTCCTGGCGCTGA — CATCGGCT-TTGCTGC-TGGC — CCTGATCTTCA 


in i i ii ii i i i n in i i in i n i nil m n i 

GGCCTCGAAAACCATCACATGACCACATAGC-ATGAGGCCACTGCTGCTTCTCCATGGCCACCTTTTCAGCG 


760 770 780 790 800 810 820 

TTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGA 

I I I I I II I II INI II I I Ml I lllll till I I III 
AT-GTATGCAG — CTATCTGGTCAA — CCTCCTGGACATTTTTTCAGTCATATAAAAGCTA--TGGTGAGA 
870 880 890 900 910 920 


830 840 850 860 870 880 890 

AGACCACTGGAGCA-GCTCAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGA 

I I lllll I I II I I II III II I I I II II I III lllll I 

TG-CAGCTGGAAAAGGGTCTTGGGAAATATGAATG — CCCCCAGCTGGCCCGTGACAGACTCCTGAGGA-CA 
930 940 950 960 970 980 990 

900 910 920 930 940 950 

GGAGGC TATG-AGCTGTGATGTACTATCCTAGGA — GATGTG-TGGGCCG — AAACCGAGAAGCAC 

III I II I II II I I III II llll I II I I II III I 
GCTGTCCTCTTCTGCATCT-TGGGGACATCTCTTTGAATTTTCTGTGTTTTGCTGTACCAGCCCAGATGTTT 
1000 1010 1020 1030 1040 1050 1060 

960 970 980 990 1000 1010 1020 

TAGGACCCCACCATCCTGTGGAACAGCACAAGCAACCCCACCACCCTG — TTCTT A-CAC ATCATCCT AGA 

n i i i n mi inn in ii in n n n i 

TACGTCTGGGAGAAATTG ACAGATCAAGCTGTGAGA-CAGTGGGAAATATTTAGCAAATAAT— TTCC 

1070 1080 1090 1100 1110 1120 1130 

1030 1040 1050 1060 1070 1080 1090 

TGATGTG-TGGGCGCGC-ACCTCATCCAAGT-CTCT-TCTAACGCTAACATATTTGTC-TTTACCTTTTTTA 

II llll II I II I I III III I II II I III I I II II 
TGGTGTGAAGGTCCTGCTATTACTAAGGAGTAATCTGTGTACAAAGAAATAACAAGTCGATGAACTATTCCC 
1140 1150 1160 1170 1180 1190 1200 


1100 1110 1120 1130 1140 1150 

AATC TTTTTTTAAATTTAAATTTTATGTGT-GTGA-GTGTTTTGCCTGCCTGTATGCACACGTGTGTG 

i i i mi i i n n i n i i i n nil i i i 

CAGCAGGGTCTTTTCATCTGGGAAAGACATCCATAAAGAAGCAATAAAGAAGAGTG — CCACATTTATTTT 
1210 1220 1230 1240 1250 1260 1270 

1160 1170 1180 1190 1200 1210 1220 

TGTGTGTGTGTGTGACACTC-CTGA-TGCCTGAGGAGGTCAG-AAGAGAAAGGGTTGGTTCCATA-AGAACT 

I I II I III II 11 1 II 1 II I llll II II II II I 
TATATCTATATGTACTTGTCAAAGAAGGTTTGTGTTTTTCTGCTTTTGAAA— TCTGTATCTGTAGTGAGAT 
1280 1290 1300 1310 1320 1330 1340 

1230 1240 1250 1260 1270 1280 1290 

GGAGTTATGGA-TGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAACGTGACT 

I II II I II I I llll I II I I I II lllll I II I I lllll 

AGCATTGTGAACTGACAGGCAGCCTG — GACA — TAGAQAGGGAGA — AGAAGTCAGA — GAGGGTGACA 
1350 1360 1370 1380 1390 1400 

1300 1310 1320 1330 1340 1350 1360 

GTATA-ATAAAAAAAAAATGATATTTCGGGAAT — TGTAGAGATTGTCCTGACACCCTTCTAGTTAA--TGA 

in ii i mi n in n n n i i i i n n i n 

AGATAGAGAGCTATTTAATGGCCGGCTGGAAATGCTGGGCTGACGGTGCAGTCTGGGTGCTCGTCCACTTGT 
1410 1420 1430 1440 1450 1460 1470 

1370 1380 1390 1400 1410 1420 1430 

TCTA — AGAGGAATTGTTGATACGTAGTATACTGTATAT-GTGTATG-TATATGTATATGTATA-TATAAGA 

ill ii mi in i ii nil him n nn i i in 

CCCACTATCTGGGTGCATGATCTTGAGCAAGTTCCTTCTGGTGTCTGCTTTCTCCAT-TGTAAACCACAAGG 
1480 1490 1500 1510 1520 1530 1540 

1440 1450 1460 1470 1480 1490 1500 

CTCTTTTACTGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGG — ACATTTTACGTCA 

n n i i i in i in n n i n n n n i nil i i in 

CTGTTGCATGGGCTAA-TGA AGA — TC — ATA-TACGTGAAAATTCTTTGAAAACATATAAAG-CA 


1510 1520 1530 1540 1550 1560 

C— ACACACACAC— ACACACACACACACACGTTTATACTACGTACTGTT ATCGGTATTCTACG — TCATATAA 

i i iii ii mi mi ii mi ii him m hi i ii i 

CTATACAGATTCGAAACTC-CATTGAGTC-ATTATCCTTGCTA-TGATGATGGTGTTTTGGGGATGAGAGGG 
1610 1620 1630 1640 1650 1660 1670 

1570 1580 1590 1600 1610 1620 1630 

TGGGAT-- AGGGTAAAAGGAAACCAAAGAGTGAGTGATATTATTGTGGAGGTGACAGACTACCCCTTCTGGG 

II II I II II III llli 1 I 1111 II I I II III I 

TGCTATCCATTTCTCATGTTTTCC ATTGTTTGAAACAA — AGAAGGTTACCAAGAAGCCTTTCCTGT 

1680 1690 1700 1710 1720 1730 1740 

1640 1650 X 1670 1680 1690 1700 

TACGTAGGGACAGACCTCCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCAAGTTCCCGGACGA 

II I II III 

AGCCTTCTGTAGGAATTCC 
1750 1760 


13. ELLIS-0 1 2-F IG2AB . SEQ (1-2350) 

S53354 B-cell activation protein=B-G antigen IgV domain h 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 

REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

COMMENT 


FEATURES 

mRNA 

CDS 


BASE COUNT 
ORIGIN 


S53354 2574 bp PRI 05-APR-1993 

B-cell activation protein=B-G antigen IgV domain homolog [human, 
SAC-activated B lymphocytes , Genonic/mRNAi 2574 ntl 
S53354 

human SAC-activated B lymphocytes 
Unclassified. 

Unclassified. 

1 (bases 1 to 2574) 

KozlowrE.J.i WilsonrG.L.r Fox,C.H. and KehrlrJ.H. 

Subtractive cDNA cloning of a novel member of the Ig gene 
superfamily expressed at high levels in activated B lymphocytes. 
Blood 81, 454-461 (1993) 
full automatic 

This entry CNCBI gibbsq 1237441 uas created by the journal scanning 
component of NCBI/GenBank at the National Library of Medicine. 

This sequence comes from Fig. 7 and 1. 

Location/Qualifiers 

271.. 2574 

312. . 929 

/note="B-G antigen IgV domain homolog,' For the protein 
sequence (NCBI gibbsq 123745): Method: conceptual 
translation supplied by author. This sequence cones from 
Fig. 1.” 

/product=“B-cel l activation protein" 

/codon_start=l 

/translation="MSRGLQLLLLSCAYSLAPATPEVKVACSEDVDLPCTAPWDP8VP 

YTVSWVKLLEGGEERMETPQEDHLRGQHYHQKGQNGSFDAPNERPYSLKIRNTTSCNS 

GTYRCTLQDPDGQRNLSGKVILRVTGCPAQRKEETFKKYRAEIVLLLALVIFYLTLII 

FTCKFARLQSIFPDFSKAGMERAFLPVTSPNKHLGLVTPHKTELV" 

649 a 595 c 689 g 641 t 


Initial 

Score = 

143 

Optimized Score = 897 

Significance = 

7.33 

Residue 

Identity = 

467. 

Matches = 1088 

Mismatches = 

951 

Gaps 

= 

279 

Conservative Substitutions 

= 

0 


X 10 20 

ATGTCC-ATGAACTGCTGAGTG 


TGAGCTGCGCCTACAGCCTGGCTCCCGCGACGCCGGAGGTGAAGGTGGCTTGCTCCGAAGATGTGGACTTGC 
340 350 360 370 380 390 400 410 



30 40 50 60 70 80 

GATAAACAGCAC — GGGATATCTCTGT — CTA-AAGGAATATTACTACACCAGGAAAAGGACACATTCGA 

! II II I lllli II III I II I I II II I III II 
CCTGCACCGCCCCCTGGGATCCGCAGGTTCCCTACACGGTCTCCTGGGTCA--AGTTATTGGAGGGTGGTGA 
420 430 440 450 460 470 480 

90 100 110 120 130 140 150 

CAACAGGAAAGGAGCCTGT CACAG AAAACC A — CAGTGTCCTGTGCATGTGACATTTCGCCATGGGAAACA 

i in i mi i i iii mi in ii in i hi i i in it i 

AGAGAGG-ATGGAGACACCCCAGGAAGACCACCTCAGGGGAC— AGCA-CTATCAT— CAGAAGGGGCAA-A 
490 500 510 520 530 540 


160 170 180 190 200 210 220 

ACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTG-TGA — GAA--GGTGGGAGCCGTGCAGA 

I III II 1 II 11 II III I III II III I III III | 
ATGGTTCTTTCGACGCCCCCAATGAAAGGC-CCTA-TTCCCTGAAGATCCGAAACACTACCAGC — TGC — A 
550 560 570 580 590 600 610 

230 240 250 260 270 280 

ACTC-CTGTGATAACTGT-CAGCCTG-GTAC — TTTCTGCAGAAAATACAATCCAGTCTGCAAGAGCTGC 

llll I III II II III I II I lllli II I I III III II II 
ACTCGGGGACATACAGGTGCACTCTGCAGGACCCGGATGGGCAGAGAAACCTA — AGT-GGCA— AGGTGA 
620 630 640 650 660 670 

290 300 310 320 330 340 350 

CCTCCA-AGT-AC — CTTCTCCAGCATAG-GTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGG — 

1 1 I III II I II III II II I I II I III I I lllli III I 
TCTTGAGAGTGACAGGATGCCCTGCACAGCGTAAAGAAGAGACTTTTAAGAAATACAGAGCG-GAGATTGTC 
680 690 700 710 720 730 740 

360 370 380 390 400 410 420 

CTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACA — ACGCGGAGT-GTGAGTGCATTGAAGGATTCCA 

n i i n n i mi n n n in i ii n in mi i i n 

CTGCTGCTGGCTCTGGTTATTTT-CTACTTAACACTCATCATTTTCACTTGTAAGT — TTGCACGGCTACA 
750 760 770 780 790 800 810 

430 440 450 460 470 480 

TTGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTGCAGGCCT — GGCCAG-GAGCT AACGAAGC 

ii in i i i i i n i in i n i mi i n n mi 

GAGTATCTTCCCAGATTTTTCTAAAGCTGGCATGGAACG-AGCTTTTCTCCCAGTTACCTCCCCAAATAAGC 
820 830 840 850 860 870 880 

490 500 510 520 530 540 550 

AGGG-TTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTG 

mi i i i in mi i mi in i i in i i i mi 

ATTTAGGGCTAGTGACTCCTCACAAGACAGAAC-TGGTATGAGCAGGA — TTTCTGCAGGTTCTTCTTCCTG 
890 900 910 920 930 940 950 


560 570 580 590 600 610 620 

GACGAACTGCTCTCTAGACGGAAGGTCTGTGCTTAAGACCGGGACCACGGAGAAGGA-CGT — GGTG-TGTG 

i mi n n i nil i i n n i nil n i i in i 

AAGCTGAGGCTC — AG-GGGTGTGCCTGTCTGTTACACTGGAGGAGAGAAGAATGAGCCTACGCTGAAGAT 
960 970 980 990 1000 1010 1020 


630 640 650 660 670 680 690 

GACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTT — CTGTGACTCCAGAGGGAGGACCAGGAGGGCA 

i i linn i nil i i n i i in i n in i i i inn 

GGCATCCTGTGAAGTCCTTCAC-CTCACTGAAAACATCTGGAAGGGGATCCCACCCCATTTTC-TGTGGGCA 
1030 1040 1050 1060 1070 1080 1090 

700 710 720 730 740 750 

CTCCTTGCAGGTCCTTACCT-TGTTCCTGGCGCTGA — CATCGGCT-TTGCTGC-TGGC — CCTGATCTTCA 

in i i i i n i i i n in i i in i n i mi in n i 

GGCCTCGAAAACCATCACATGACCACATAGC-ATGAGGCCACTGCTGCTTCTCCATGGCCACCTTTTCAGCG 
1100 1110 1120 1130 1140 1150 1160 



760 770 780 790 800 810 820 

TTACTCTCCTGTT CTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGA 

i i i i i ii mi mi ii i i i it i iiiii mi i i m 

AT-GTATGCAG — CTATCTGGTCAA — CCTCCTGGACATTTTTTCAGTCAT ATAAAAGCTA — TGGTGAGA 
1170 1180 1190 1200 1210 1220 1230 

830 840 850 860 870 880 890 

AGACCACTGGAGCA-GCTCAAGAGGAAGATGCTTGTAGCTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGA 

I I Hill 1 1 II 1 I II III II I I I II II 1 III IIIII | 
TG-CAGCTGGAAAAGGGTCTTGGGAAATATGAATG — CCCCCAGCTGGCCCGTGACAGACTCCTGAGGA-CA 
1240 1250 1260 1270 • 1280 1290 

900 910 920 930 940 950 

GGAGGC TATG-AGCTGTGATGTACTATCCTAGGA — GATGTG-TGGGCCG — AAACCGAGAAGCAC 

III I II I II II I III I I I 1111 I II I I II III I 

GCTGTCCTCTTCTGCATCT-TGGGGACATCTCTTTGAATTTTCTGTGTTTTGCTGTACCAGCCCAGATGTTT 


1300 

1310 

1320 

1330 

1340 

1350 

1360 

960 

970 

980 

990 

1000 

1010 

1020 


TAGGACCCCACCATCCTG7GGAACAGCACAAGCAACCCCACCACCCTG — TTCTTA-CACATCATCCTAGA 

II II I II INI IIIII I II II III II II II I 

TACGTCTGGGAGAAATTG ACAGATCAAGCTGTGAGA-CAGTGGGAAATATTTAGCAAATAAT — TTCC 

1370 1380 1390 1400 1410 1420 1430 

1030 1040 1050 1060 1070 1080 1090 

TGATGTG-TGGGCGCGC-ACCTCATCCAAGT-CTCT-TCTAACGCTAACATATTTGTC-TTTACCTTTTTTA 

II 1111 II I II I I III III I II II I III I I II II 
TGGTGTGAAGGTCCTGCTATTACTAAGGAGTAATCTGTGTACAAAGAAATAACAAGTCGATGAACTATTCCC 
1440 1450 1460 1470 1480 1490 1500 

1100 1110 1120 1130 1140 1150 

AATC TTTTTTTAAATTTAAATTTTATGTGT-GTGA-GTGTTTTGCCTGCCTGTATGCACACGTGTGTG 

i i i mi i i n n i n i i i n nil i i i 

CAGCAGGGTCTTTTCATCTGGGAAAGACATCCATAAAGAAGCAATAAAGAAGAGTG — CCACATTTATTTT 
1510 1520 1530 1540 1550 1560 1570 

1160 1170 1180 1190 1200 1210 1220 

TGTGTGTGTGTGTGACACTC-CTGA-TGCCTGAGGAGGTCAG-AAGAGAAAGGGTTGGTTCCATA-AGAACT 

1 I I 1 I III II II I II I II I llfl II II II II I 
TATATCTATATGTACTTGTCAAAGAAGGTTTGTGTTTTTCTGCTTTTGAAA— TCTGTATCTGTAGTGAGAT 
1580 1590 1600 1610 1620 1630 1640 

1230 1240 1250 1260 1270 1280 1290 

GGAGTTATG'GA-TGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAACGTGACT 

i n n i ii i i mi i n i i i n iiiii i n i i iiiii 

AGCATTGTGAACTGACAGGCAGCCTG — GACA — TAGAGAGGGAGA — AGAAGTCAGA — GAGGGTGACA 
1650 1660 1670 1680 1690 1700 

1300 1310 1320 1330 1340 1350 1360 

GTATA-ATAAAAAAAAAATGATATTTCGGGAAT — TGTAGAGATTGTCCTGACACCCTTCTAGTTAA — TGA 

III I I I INI II III II 11 II I I I Mil I II 
AGATAGAGAGCTATTTAATGGCCGGCTGGAAATGCTGGGCTGACGGTGCAGTCTGGGTGCTCGCCCACTTGT 
1710 1720 1730 1740 1750 1760 1770 


1370 1380 1390 1400 1410 1420 1430 

TCTA— AGAGGAATTGTTGATACGTAGTATACTGTATAT-GTGTATG-TATATGTATATGTATA-TATAAGA 

mi ii mi n i i ii mi iiiii n mi i i in 

CCCACTATCTGGGTGCATGATCTTGAGCAAGTTCCTTCTGGTGTCTGCTTTCTCCAT-TGTAAACCACAAGG 
1780 1790 1800 1810 1820 1830 1840 1850 

1440 1450 1460 1470 1480 1490 1500 

CTCTTTTACTGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGG — ACATTTTACGTCA 

n n i i i n ii in n n i n ii iiiii i mi ii i n 

CTGTTGCATGGGCTAA-TGA AGA — TC — ATA-TACGTGAAAATTATTTGAAAACATATAAAG-CA 

1860 1870 1880 1890 1900 



1510 1520 1530 1540 1550 1560 

C-ACACACACAC- ACACACACACACACACGTTTATACTACGT ACTGTT ATCGGT ATTCTACG — TCATAT AA 

I I III I I I II I II II III! II II II I III II I I I I I 

CTATACAGATTCGAAACTC-CATTGAGTC-ATTATCCTTGCTA-TGATGATGGTGTTTTGGGGATGAGAGGG 

1910 1920 1930 1940 1950 1960 1970 

1570 1580 1590 1600 1610 1620 1630 

TGGGAT — AGGGTAAAAGGAAACCAAAGAGTGAGTGATATTATTGTGGAGGTGACAGACTACCCCTTCTGGG 

II II I II II I II III I I I Mil II I I II III I 

TGCTATCCATTTCTCATGTTTTCC ATTGTTTGAAACAA — AGAAGGTTACCAAGAAGCCTTTCCTGT 

1980 1990 2000 2010 2020 2030 2040 

1640 1650 1660 1670 1680 1690 1700 

TACGTAGGGACAGACCTCCTTCGG — ACTGTCTAA — AACTCCCCTTAGAAGTCTCG-TCAAGTTCCCGG 

II I II II II II I II II I III I III I III II 

AGCCTTCTGTAGGAATTCTTTTGGGGAAGTGAGGAAGCCAGGTCCACGGTCTGTTCTTGAAGCAGTAGCCTA 
2050 2060 2070 2080 2090 2100 2110 

1710 1720 1730 1740 1750 1760 

A CGAAGA — GGACAGAGGAGACACAGTCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCCTGTTT 

I I llll Illll I I II I I I III I II I II I II I II I 
ACACACT CCAAGATATGGACACACGGGAGCCGCTGGCAGAAGGGACTT — CACGAAGT--GTTGCATGGAT 
2120 2130 2140 2150 2160 2170 2180 

1770 1780 1790 1800 1810 1820 1830 

CGTGACACTCCACCCCTTGTGGACACTTGAGTGTCATCCTTGCGCCGGAAGGTCAGGTGGTACCC — GTCT 

II I I II llll III llll llll III I llll II I I 

-GT TTTAGCCATTGTTGGCTTTCCCTTATCAAACTTGGGCCCTTCCCT-TCTTGGTTTCCAAAGGCA 

2190 2200 2210 2220 2230 2240 

1840 1850 1860 1870 1880 1890 1900 

GTAGGGGCGGGGAGACAGAGCCGCGGGGGAGCTACGAGAATCGACTCACAGGGCGCCCCGGGCTTCGCAAAT 

I II I I II III III I I II I I I II II I II II 
TTTATTGCTGAGTTATATGTTCACTGTCCCCCTAATATTAGGGAGTAAAACGGATACCAAG — TT — GATT 
2250 2260 2270 2280 2290 2300 2310 

1910 1920 1930 1940 1950 1960 1970 

GAAACTTTTTTAATCTCACAAG — TTTCGTCCGGGCTCGGCGGACCTATGGCGTCGATCCTTATTACCTTAT 

i mu in i i mi i i n i n i i n i n n i 

TAGTGTTTTTACCTCTGTCTTGGCTTTCAT-GTTATTAAACGTATGCAT-GTGAAGAAGGGTGTT — TT-T 
2320 2330 2340 2350 2360 2370 2380 


1980 1990 2000 2010 2020 2030 2040 

CCTGGCGCCAAGATAAAACAACCAAAAGCCTTGACTCCGGTACTAATTCTCCCTGCCGGCCCCCGTAAGC — 

I 1 1 I II III II III III I I II II 1 I II I II II 

C-TG — TTTTATATTCAAC — TCATAAGACTT — TGGGATAGGAAAAATGAGTAATGGTTAC — TAGGCTT 
2390 2400 2410 2420 2430 2440 

2050 2060 2070 2080 2090 2100 2110 

-AT AACGCGGCG ATCT CCA CTTTAAGAACCTGGCCGCGTTCTGCCTGGTCTCGCTTTCGT — AAAC- 

III I II III I II II I I III II II II II I II I II llll 
AATACCTGGGTGAT-TACATAATCTGT-ACAACGAACCCCCATGATG-TAAGTTTACCTAT-GTAACAAACC 
2450 2460 2470 2480 2490 2500 2510 

2120 2130 2140 2150 2160 2170 X 2180 

GGTTCT — TACAAAAGTAATT-AGTTCTTGCTTTCAGCCTCCAAGCTTCTGCTAGTCTATGGCAGCATCAAG 

I II III I I I II I I II I II I I I II I I I 

TGCACTTATACCCATGAACTTAAAATGAAAGTTAAAAATAAAAAACATATACAAATAAAAAAAA 


2520 

2530 

2540 

2550 

2560 

2570 X 

2190 

2200 

2210 

2220 




GCTGGTATTTGCTACGGCTGACCGCTACGCCGCCGCAATAAG 


14. ELLIS-Q12-FIG2AB.SEQ (1-2350) 


LOCUS HSIL05 6684 bp DNA PR I 03-JAN-1991 

DEFINITION Hunan interleukin-2 (IL-2) gene and 5'-flanking region 
ACCESSION X 00695 X00200 X00201 X00202 

KEYWORDS grouth Factor; interleukin; T-cell grouth factor. 

SOURCE human 

ORGANISM Homo sapiens 

Eukaryota; Animalia; Metazoa; Chordata; Vertebrata; Mammalia; 
Theria; Eutheria; Primates; Haplorhini; Catarrhini! Honinidae. 
REFERENCE 1 (bases i to 6684) 

AUTHORS Holbrook ,N.J. , Lieber,M. and Crabtree>G.R. 

TITLE DNA sequence of the 5’ flanking region of the human interleukin 2 
gene: homologies with adult T-celi leukemia virus 
JOURNAL Nucleic Acids Res. 12, 5005-5013 (1984) 

STANDARD full automatic 
REFERENCE 2 (bases 1 to 6684) 

AUTHORS Degrave,W.r Tavernier, J. , Duerinck,F., Plaetinck.G., Devos, R. and 
Fiers.W. 

TITLE Cloning and structure of the human interleukin 2 chromosomal gene 
JOURNAL ENB0 J. 2, 2349-2353 (1983) 

STANDARD full automatic 
REFERENCE 3 (bases 1 to 6684) 

AUTHORS Taniguchi ,T. , Matsui,H., Fujita»T., Takaoka,C., Kashima,N., 
YoshimotoiR. and Hamuro,J. 

TITLE Structure and expression of a cloned cDNA for human interleukin-2 
JOURNAL Nature 302, 305-310 (1983) 

STANDARD full automatic 
FEATURES Location/Sualif iers 

promoter 1339.. 1344 

/note="TATA-box“ 
precursor_RNA 1363.. 6403 

/note="primary transcript" 
intron 1563.. 1652 

/note=°intron I" 
intron 171 3.. 4004 

/note=“ intron II" 
intron 4149.. 6009 

/note=" intron III” 
misc_feature 6382.. 6387 

/note= n polyadenylation signal” 

CDS join(1416.. 1562, 1653.. 1712, 4005.. 4148, 6010.. 6117) 

/product=“ inter leuk in" 

/codon_start=l 

/transiation="MYRM8LLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLGMI 

LNGINNYKNPKLTRHLTFKFYHPKKATELKHLflCLEEELKPLEEVLNLAflSKNFHLRP 

RDLISNINVIVLELKGSETTFMCEYADETATIVEFLNRWITFCQSIISTLT" 

BASE COUNT 2342 a 1113 c 1064 g 2165 t 
ORIGIN 

Initial Score = 142 Optimized Score = 737 Significance = 7.26 

Residue Identity = 472 Matches = 905 Mismatches = 732 

Gaps = 257 Conservative Substitutions = 0 

X 10 20 

ATGTCCATG— AACTGCTGAGT 

I II III I II II 

ATGAATCACTTATTAGTGGACTGTTTCAGTTGAATTAAAAAAATACATTGAGATCAATGTCATCTAGACATT 
4880 4890 4900 4910 4920 X 4930 4940 

30 40 50 60 70 80 

G-GATAAACAGCACGGGATATCTCTGTCTAAAG GAATATTACT-ACACCAGGAAAAGGACACAT 

i ii iii i iiiii ii i i it ii n i i m m ii ii m 

GACAGATTCAGTTC— CTTATCTATGGCAAGAGTTTTACTCTAAAATAATTAACATCAGAAA ACTCAT 

4950 4960 4970 4980 4990 5000 

on inn ha ton nn un ic-n 


TC-GACAACAGGAAAGGAGCCTGTCACAGAAAACCA — CAGTGTCCTGTGCA-TGTGACATTTCGCCATGGG 

hi i ii i i i i i i Mini ii in i mi mi i i 

TCTTAACTCTTGATACAA--AT7TAAGACAAAACCATGCAAAAATCTGAAAACTGTG — TTTC-AAAAGCC 
5010 5020 5030 5040 5050 5060 5070 

160 170 180 190 200 210 

AAACAACTGTTACAACGTGGTGGTC AT-TG TGCTGCTGCTAGTGGGCTGTGAGAAGGT-GGGA 

Hill I III II II II II II III III I III I I II 
AAACACTTTTTAAAATAAAAAAATCCCAAGATATGACAATATTTAAACAATTATGCT-TAAGAGGATACAGA 
5080 5090 5100 5110 5120 5130 5140 


220 230 240 250 260 270 280 

GCCGTGCAGAACTCCTGTGATAACTGTCAGCCTGGTACTTTCTGCAGAAAATACAATCCAGTCTGCAAGAGC 

i mi i i ii i n i i n n in in i i in nn i 

ACACTGCAACAGT-TTTTTAAAAGAG-AATACT — TA-TTTAAAGGGAACACTCTATCTCACCTGCTTTTGT 
5150 5160 5170 5180 5190 5200 5210 

290 300 310 320 330 340 350 

TGCCCTCCAAGTACCTTCTCCAGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTAT 

i n n i n i n i n i ii i n i i tin n i i i n n 

TCCCAGGGTAGGA— ATC-ACTTCAAATTTGAAAAG-CTCTCTTTTAAATCT-CACTATATAT-CAAAATAG 
5220 5230 5240 5250 5260 5270 

360 370 380 390 400 410 420 

TTCAGGTTCAAGAAGTTTTGCTCCTCTAC-CCACAACGCGGAGTGTGAG-TGCATTGA--AG-GATTCC— A 

II I II II III II II II III II I 1 II I II II II I 

TT-GCCTC CTTAGCTTATCAACTAGAGGAAGCGTTTAAATAGCTCCTTTCAGCAGAGAAGCCTAA 

5280 5290 5300 5310 5320 5330 5340 

430 440 450 460 470 480 

TTGCT-TGGGGCCA — CAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGG — AGCTAACGAAGCA 

II II till II I II I II I II I I I II I II II II II 

TTTCTAAAAAGCCAGTCCACAGAACAAAATTTCTAATG — TTTAAAGCTTTTAAAAGTTGGCAAATTCACCT 
5350 5360 5370 5380 5390 5400 5410 

490 500 510 520 530 540 550 

GGGTTGCAAAACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTGGACGAA 

I III I II I I I I III I II II I II II I III II II II 
GCATTG-- ATAC-TAT-GATGGGGTAGGGATAGGTGTAAGTA-TTTA-TGAAG-ATGTTCATTC-ACACAAA 
5420 5430 5440 5450 5460 5470 

560 570 580 590 600 610 620 

CTGCTCTCTAGACGGAAGGTCTGTGCTTAAGACCGGGACCACGGAGAAGGACG — TGGTGTGTGGACCCCCT 

I I I I II I I I III II I I II I I II II I II I III I 

TT— TACCCAAACAGGAAGCATGTCCTACCTAGC-TTACTCTAGTGTAGCTCGTTTCGTCTTTGGGGAAAAT 


5480 

5490 

5500 

5510 

5520 

5530 

5540 

630 

640 

650 

660 

670 

680 

690 


GTGGTGAGCTTCTCTCCCAGTACCACCAT TTCTGTGACTCCAGAGGGAG-GACCAGGAGGGCACTCCT 

i in in ii nil in n i i n in n i i n ii 

ATAAGGAGATTCACT-TAAGTAGAAAAATAGGAGACTCT-AATCAAGATTTAGAAAAGAAGAAAGTATAATG 
5550 5560 5570 5580 5590 5600 5610 

700 710 720 730 740 750 

TGCAGGTC CTTACCTTGTTCCTGGCGCTGACATCGGCTTTGCTGCTG GCCCTGATCTTCAT 

nil n i in n i n i i i n n i mi n i i n ii 

TGCATATCAATTCATACATT-TAACTTACACAAATATAGGTGTACATTCAGAGGAAAAGCGATCAAGTTTAT 
5620 5630 5640 5650 5660 5670 5680 

760 770 780 790 800 810 

TACTC-TCCTG — TT--CTCTGTGCTCAAATGGA — TCAGGAAAA AATTCCCCCACATATTCAAGCA 

I I I III I II I I II I II I II III III I I II II II 
TTCACATCCAGCATTTAATATTTGTCTAGATCTATTTTTATTTAAATCTTTATTTGCACCCAATTTAGGGAA 
5690 5700 5710 5720 5730 5740 5750 

aift otn o«;n o la 07 a ooa 


ACCATTTAAGAAGAC — CACTGGAGCAGCTCAAQAGGAAGATGCTTGTAGCTGCCGATGT — CCA-CAGGA 

i mi i i mi mm nun n n nn i i in i i n 

AAAATTTTTGTGTTCATTGACTGAATTAACAAATGAGGAAAAT-CT — CAGCTTCTG-TGTTACTATCATTT 
5760 5770 5780 5790 5800 5810 5820 

890 900 910 920 930 940 950 

AGAAGAAGGAGGAGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAGATGTGTGGGCCGAAACCGAGAA 

I I I I I I I II II I II I 1 II I I II 1 1111 I III 

GGTATCATAACAA— AATACGCAAT TTTGGCATTC-AT-TTTGATCATTT CAAGAAAATGTGAA 

5830 5840 5850 5860 5870 5880 

960 970 980 990 1000 1010 1020 

GCACTAGGACCCCACCATCCTGTGGAACAGC — ACAAGCAACCCCACCACCCTGTTCTTACACATCATCCT 

II I II I III I III I II II I 11 II I I I II III 

TAATT AATATGTT-TGGTA-AGCTTGAAAATAAAGGCAACAGGCC — TATAAGACTTCAATTG 

5890 5900 5910 5920 5930 5940 

1030 1040 1050 1060 1070 1080 1090 

AGATGA-- TGTGTGGGCGCGCACCTCATCCAAGTCTCTTCTAACGCTAACATATTTGTCTT-TACCTTTTTT 

II I III I II II II II II II llllll III I II II II 

GGAATAACTGTATATAAGGTAAACTACTC— TGTACTTTAAAAAATTAACATTTTTCTTTTATAGGGATCTG 
5950 5960 5970 5980 5990 6000 6010 


1100 1110 1120 1130 1140 1150 1160 

AAATCTTTTTTTAAATTTAAATTTTA7G-TGTGTGAG--TGTTTTGCCTGCCTGTATGCACACGTGTGTGTG 

III I II I I I III I II II I II III III II III II I 
AAA-CAACATTCATGTGTGAATATGCTGATGAGACAGCAACCATTG-TAGAATTTCTGAACAGATGGATTAC 
6020 6030 6040 6050 6060 6070 6080 

1170 1180 1190 1200 1210 1220 

TGTGTGTGTGTGACACTCCTGATGCCTGAGGAGGTCA-GAAGAG — AAAGGGTTGGTTC-CATAAGAAC-TG 

tin i i n i nn i i i in i n i n n i i 

CTTTTGTCAAAGCATCATCTCAACACTGACTTGATAATTAAGTGCTTCCCACTTAAAACATATCAGGCCTTC 
6090 6100 6110 6120 6130 6140 6150 

1230 1240 1250 1260 1270 1280 1290 

GAGTTATGGATGGCTGTGAGCCGGNNNGATAGGTCGGGACGGAGACCTGTCTTCTTATTTTAAC-GTGACTG 

i nil i in n i i i i i n i nil nn i i i 

TATTTATTTAAATATTTAAATTTTATATTTATTGTTGAATGTATGGTTTGCTACCTATTGTAACTATTATTC 


6160 

6170 

6180 

6190 

6200 

6210 

6220 

6230 

1300 

1310 

1320 

1330 

1340 

1350 

1360 

1370 


TATAATAAAAAAAAAATGATATTTCGGGAATTGTAGAGATT-GTCCTGACACCCTTCTAGTTAATGATCTAA 

i nil nn n i in n n n nn i n i n nn i inn 

T-TAATCTTAAAACTAT-AAATAT-GGATCTTTTA-TGATTCTTTTTGTAAGCC— CTAG — GGGCTCTAA 
6240 6250 6260 6270 6280 6290 

1380 1390 1400 1410 1420 1430 

GAGG AATTGTTGAT-ACGTAGTATACTGTA-TATG-TGTATGTATATGTATATGTATATAT-AAGAC 

ii in n n i i in n nn n nil n nn nn in in 

AATGGTTTCACTTATTTATCCCAAAATATTTATTATTATGTTGAATGT-TAAATATA-GTATCTATGTAGAT 


6300 

6310 

6320 

6330 

6340 

6350 

6360 

1440 

1450 

1460 

1470 

1480 

1490 

1500 


TCTTTTACTGTCAAAGTCAACCTAGAGTGTCTGGTTACCAGGTCAATTTTATTGGACATT — TTACGTCACA 

i n n mi i n i n i i i i n nn in in i i 

TGGTT — AGTAAAA — CTATTTAATAAATTTGATAA — ATATAAACAAGCCTGGATATTTGTTATTTTGGA 
6370 6380 6390 6400 6410 6420 

1510 1520 1530 1540 1550 1560 1570 

CACA-CACACACACACACACACACACGTTTATACTACGTAC — TGTTATCGGTATTCTACGTCATATAATG 

in nil i i n m in i n in in i i in i n n in 

AACAGCACAGA-GTAAGCATTTAAATATTTCT— TAGTTACTTGTGTGAACTGTAGGAT-GGT— TAAAAT- 
6430 6440 6450 6460 6470 6480 6490 


■ACCCCTT 


GGATAGGGTAAAAGGAAACCAAAGAGTG-AGTGATAT-TATTGTGGAGGTGACAGACT- 

I II Hill II II II till II III II lllil 
GCTTA — CAAAAGTCACTCTTTCTCTGAAGAAATATGTAGAACAGAGATG-TAGACTTCTCAAAAGCCCTT 
6500 6510 6520 6530 6540 6550 6560 

1640 1650 1660 1670 1680 1690 

-CTGGGTACGT — AGGG ACAGACCTCCTTCGGACTGTCTAAAACTCCCCTTAGAAGTCTCGTCAAGT 

ii ii i i mi mu mi i m i i ii mu 11 i i i 

GCTTTGTCCTTTCAAGGGCTGATCAGAC-CCTTAGTTCTGGC — ATCT — CTTAGCAGATT— ATATTT 
6570 6580 6590 6600 6610 6620 

1700 1710 1720 1730 1740 1750 1760 X 

TCC — CGGACGAAGAGGACAGAGGAGACACAGTCCGAAAAGTTATTTTTCCGGCAAATCCTTTCCCTGTTTC 

III I II I I 11 1 I I III II II II III II II III I 

TCCTTCTTCTTAAAATGCCA-AACACAAACACTCTTGAA ACTCTTC ATAGATTTGGTGTGGC 

6630 6640 6650 6660 6670 6680 X 


1770 1780 1790 1800 1810 

GTGACACTCCACCCCTTGTGGACACTTGAGTGTCATCCTTGCGCCGGAAG 


15. ELLIS-012-FIG2AB.SES (1-2350) 

RATTGFB Rat transforming growth factor-beta (TGF-beta) mas 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 


JOURNAL 

STANDARD 

FEATURES 

mRNA 

CDS 


RATTGFB 6244 bp ss-nRNA ROD 09-0CT-1990 

Rat transforning growth factor-beta (TGF-beta) nasking protein 
large subunit> complete cds. 

M55431 

transforning growth factor-beta 1 binding protein) 
transforning growth factor-beta masking protein. 

Rat (strain Wistar) kidney* cDNA to nRNA. 

Rattus norvegicus 

Eukaryota) Aninalia) Chordata; Vertebrate) Mammalia) Theria) 
Eutheria) Rodentia) Myonorpha) Muridae) Murinae. 

1 (bases 1 to 6244) 

TsujirT.t OkadarF.r Yamaguchi,K. and Nakanura»T. 

Molecular cloning of the large subunit of transforning growth 
factor type beta masking protein and expression of the nRNA in 
various rat tissues 

Proc. Natl. Acad. Sci. U.S.A. 87, 8835-8839 (1990) 
full automatic 

Location/Qualifiers 
<1 . .6244 

/gene="TGF-beta masking protein large subunit” 

334.. 5472 

/gene="TGF-beta masking protein large subunit” 
/note=”putative“ 

/product=“TGF-beta masking protein large subunit” 
/codon_start=l 

/translate on=”MAGAWLRWGLLLWAGLLAWSAHGRVRRITYVVRPGPGLPAGTLP 
LAGPPRTFNVALDARYSRSSTATSSRSLAGPPAERTRRTSQPGGAALPGLRSPLPPEP 
ARPGAPSRGLHSKAGAQTAVTRFAKHGRQVVRSKVQQDTQSSGGSRL8VGQKGQLQGI 
NVCGGGCCHGWSKAPG5QRCTKPSCVPPCGNGGMCLRPQFCVCKPGTKGKACEITAAQ 
DTMSPVFGGQNPGSSWVPPEPAAKRTSTKKADTLPRVSPVAQMTLTLKPKPSMGLSQQ 
IHSQVAPLSSQNVMIRHGQTGEYVLKPKYFPAPKVVSGEQSTEGSFSLRYGGEQGTAP 
FQVSNHTGRIKVVFTPSICKVTCTKGNCHNSCGKGNTTTLISENGHAADTLTATNFRV 
VICHLPCMNGGQCSSRDKCQCPPNFTGKLCQIPVLGASMPKLYQHAQQPGKALGSHVI 
HSTHTLPLTNTNGQGVKVKFPPNIVNIHVKHPPEASVGIHQVSRIDGPVGQRVKEVQP 
GGSGVSYQGLPVQKTQTVHSTYSHQQVIPHVYPVAAKTQLGRCFQETIGSQCGKALPG 
LSKQEDCCGTyGTSWGFNKCGKCPKKQSYHGYTQMMECLGGYKRVNNTFCGDINECGL 
QGVCPNGECLNTMGSYRCSCKMGFGPDPTFSSCVPDPPHISEEKGPCYRLVSPGRGCM 
HPLSVHLTKQICCCSVGKAWGPQCEKCPLPGTAAFKEICPGGMGYTVSGIHRRRPIHG 
HIGKEAVFVKPKNTQPVAKSTHPPPLPAKEEPVEALTSSREHGPGVAEPEVVTAPPEK 
EIPSLDQEKTRLEPGQPQLSPGVSTIHLHPQFPVVVEKTSPPVPVEVAPEGSTSSASG 
VIAPTGVTEINECTVNPDICGAGHCINLPVRYTCICYEGYKFSEQQRKCIDIDECAQA 
flui rcnrorcwrcrcci rircirci AccercMrTnumrri DDnur Dor Dn mt ao atd 



CEyCDSGYRMSRRGHCED I DECLTPST CPEEQCVNSPGS YQCVPCTEGFRGWNGQCLD 
VDECLQPKVCTNGSCTNLEGSYNCSCHKGYSPTPDHRHCQDIDEC8QGNLCMNGQCKN 
TDGSFRCTCGQGYQLSAAKD3CEDIDECEHRHLCSHGQCRNTEGSF8CLCN8GYRASV 
LGDHCEDINECLEDSSVC8GGDCINTAGSYDCTCPDGLQLNDNKGC8DINECASPGLC 
APHGECLNTQGSFHCVCESGFSISADGRTCEDIDECVNNTVCDSHGFCDNTAGSFRCL 
CY8GFQAP6DGQGCVDVNECELLSGVCGEAFCENVEGSFLCVCADENQEYSPMTG8CR 
SRATEDSGVDR8PKEEKKECYYNLNDASLCDNVIAPNVTKQECCCTSGAGWGDNCEIF 
PCPVQGTAEFSEMCPRGKGFVPAGESSYETGGENYKDADECLLFGEEICKNGYCLNTQ 
PGYECYCKEGTYYDPVKLQCFDMDECQDPNSCIDG8CVNTEGSYNCFCTHPHVLDASE 
KRCVQPTESNEQIEETDVYQDLCWEHLSEEYyCSRPLVGKSTTYTECCCLYGEAWGMS 
CALCPMKDSDDYAQLCNIPVTGRRRPYGRDALVDFSEQYGPETDPYFI8DRFLNSFEE 
L8AEECGILNGCENGRCVRV8EGYTCDCFDGYHLDMAKHTCVDVNECSELNNRMSLCK 
NAKCINTEGSYKCyCLPGYVPSDKPNYCTPLNTALNLDKDSDLE" 

BASE COUNT 1539 3 1643 c 1686 g 1376 i 
ORIGIN 

Initial Score = 141 Optimized Score = 993 Significance = 7.19 
Residue Identity = 477. Matches = 1204 fiisnatches = 1005 
Gaps = 303 Conservative Substitutions = 0 


X 10 20 

ATGTCCATGAACTGCTGA — GT 


CCCGATGTGTGT AGGGACGGCCGCTGCATCAACACTGCTGGGGCCTTCCG'ATGCGAAT — ACTG-TGACAGT 
3200 3210 3220 3230 3240 3250 3260 


30 40 50 60 70 80 

GGATA AACAGCACGGGATATCTGTGTCTA-AAGGAATATT-ACT-ACACCAGGAAAAGGACACATT 

ii ii ii iiiii n n i i ii i i ii ii m i i mi 

GGGTACCGGATGTCACGACGGGGCCACTGTGAGGATATCGATGAGTGTCTGACCCCAAGTACCTGTCCCGAG 
3270 3280 3290 3300 3310 3320 3330 


90 100 110 120 130 140 150 

CGACAA-CAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGCCATG — GGA 

mi i ii ii i i mi mi i mi i n ii n in 

GAACAATGCGTGAATTCC-CGAGGTTC — TTACCAGTGTGTGCCCTGCACAGAAGGGTT — CCGTGGCTGGA 
3340 3350 3360 3370 3380 3390 3400 

160 170 180 190 200 210 

A — ACAACTGTTACAACGTGGTG'GTCATTGTGCTGCTGCTAGTGGGCTGTGAGAA-GGTGGGAGC C 

i mi n i i mi i i n mi n i n nil n in n i 

ATGGACAA-TGCCTCGATGTGGACG— AGTG-CCTGCAGCCAAAGGTCTGTACCAATGGTTCCTGCACCAAC 
3410 3420 3430 3440 3450 3460 3470 

220 230 240 250 260 270 280 

GTGCAGAACTCC TGTGATAACTGTCAGCCTGG-TACTTTCTGCAGAAAATACAATCCAGTCTG-CAAG 

n i nn nil i in n n in i n i i n n i in nil 

CTGGAAGGCTCCTACATGTG-TTCCTGCCACAAGGGCTAC-AGCCCCACACCAGACCATAGACACTGTCAAG 
3480 3490 3500 3510 3520 3530 3540 

290 300 310 320 330 340 

A GCTGCCCTCCAAGTACCTTCTCC— AGCATAGGTGGACAGCCGAACTGTAACATCTGCAGAGTGTGT 

I I II I III II 1111 II III II 1111 III I I I 

ATATTGATG-AATGTCAGCAAGGGAACCTGTGCATGAACGGGCAGTGCAA — AAACA-CTGACGGCTCCTT 
3550 3560 3570 3580 3590 3600 3610 

350 360 370 380 390 400 410 

GCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTC 

i i n i n n i i m n i n i i iiiii iiiii i n i 

CCGGTGTACCTGTGG-GCAGGGCTATCAGCT-GTCAGCGGCTAAAGACCAATGTGAAGATATTGACGAATGC 
3620 3630 3640 3650 3660 3670 3680 


420 430 440 450 460 470 480 

CATTGCTTGGGGCCACAG— TGCACCAGATGTG-AAAAGGACTGCAGGC-CTGGCCAGGAGCTAACGAAGCA 

i i i n i i in i in i i n i in i nil i n n 

nar:i'ir-rCTrArrTrTrrTrTrArrrf:r&f;Trr AccAArArAriscrrrTrrTTrrarTrTTTrTrraArrA 


3690 3700 3710 3720 3730 3740 3750 

490 500 510 520 530 540 550 

GGGTTGCAAA — ACCTGTAGCTTGGGAACATTTAATGACCAGAACGGTACTGGCGTCTGTCGACCCTGGACG 

mu ii i i mi min n i n i n i i 11 n i 11 mi 

GGGTTACAGAGCATCTGT-GCTTGGAGAC — CACTG-CGAGGATATCAATGAATGCT-TGGA GGAC- 

3760 3770 3780 3790 3800 3810 

560 570 580 590 600 610 620 

AACTGCTCTCTAGACGGAAGGTCTGTGC-TTAAGA — CCGGGACCACGGAGAAGGACGTG-GTGTGTGGAC 

ii i m ii mi m i ii i i m ii ii i mu i mu 

-AGTAGTGTCTGCCAGGGAGGTGACTGCATCAATACAGCAGGGTCCTATGA-CTGCACGTGCCCGGATGGAC 


3820 

3830 

3840 

3850 

3860 

3870 

3880 

630 

640 

650 

660 

670 

680 

690 


CCCCTGTG-GTGAGCTTCTCTCCCAGTACCACCATTTCTGTGACTCCAGAGGGAGGACCAGGAGGGCACTCC 

II II III 1 I II INI II 1 II II Hill II I III 

TCCAGCTGAATGA-CAATAAGGGCTGTCAAGACATTAATGAATGTGCACAGCCAGGACTCTGTGCAC-CTCA 


3890 

3900 

3910 

3920 

3930 

3940 

3950 

700 

710 

720 

730 

740 

750 

760 


T-TGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCGGCTTTG-CTGCTGGCC--CTGATCTTCATTACTCT 

I I II II I I I III I II II II III II I I III III 1111 
TGG'GGAGTGTCTAAAC— ACACAAGGCTC — ATTCCACTGTGTCTG-TGAACAAGGGTTCTCCAT — CTCT 
3960 3970 3980 3990 4000 4010 

770 780 790 800 810 820 

CC — TGTTC— TCT-GTGCTCAAATGGATCAG-GAAAAAATTCCCCCACATATTCAAGCAACCATTTAAGA 

I II II II III III III II I II I I II II II II I 

GCAGATGGTCGTACTTGTGAAGATATTGATGAGTGTGTTAACAACACTGTGTGTGACAGTCACGGCTTCTG- 


4020 

4030 

4040 

4050 

4060 4070 4080 

830 

840 

850 

860 

870 880 890 


AGACCACTGGAGCAGCTCAAGAGGAAGATGCTTGTA GCTGCCGATGTCCACAGGAAGAAGAAGGAGG 

in n i mi ii ii n in i i ilium i mi 1 

TGACAACACAGCCGGCTCTTTCCGCTGCCTCTGTTATCAGGGCTTTCAAGCCCCACAGGATGGGCAAGG-GT 
4090 4100 4110 4120 4130 4140 4150 


900 910 920 930 940 950 960 

AGGAGGAGGCTATGAGCTGTGATGTACTATCCTAGGAG-ATGTGTGGGCCGAAACCGAGAAGCACTAGGACC 

I III I 1 Hill II II II 1 Hill II I I III I III 
GTGTGGATGTGAACGAATGTGAACTGC— TCAGTGGTGTATGTGGGGAGGCTTTCTGTGAA-AATGTGGAAG 
4160 4170 4180 4190 4200 4210 4220 

970 980 990 1000 1010 1020 1030 

CCACCATCCTGTG-GAACAGCACAAGCAACCCCACCACCCTGTTCTTACACATCATCCTAGATGATGTGTGG 

ii mini i mi in i n i im i hi i mi i 

GGTCCTTCCTGTGCGTGTGTGCCGATGAGAACCAGGA GTACAGCCCCATGA-CTGG--GCAGTGTCG 

4230 4240 4250 4260 4270 4280 4290 

1040 1050 1060 1070 1080 1090 

— GCGCGCACCT — CATCCAAGT CTCTTCT AACGCTAA-CATATTTGTCTTTACCTTTTTTAAATC 

II II II 11 INI 11 II I I 1 II II I 11 II 1111 

CTCCCGGGCTACTGAAGATTCAGGTGTGGATCGTC-AGCCCAAAGAAGAAAAGAAGGAGTGTTATTATAATC 
4300 4310 4320 4330 4340 4350 4360 


1100 1110 1120 1130 1140 1150 1160 

TTTTTTTAAATTTAAATTTTATGTGTG'TGAGTGTTTTGCCTGCC — TGTATGCACACGTGTGTGTGTGTGTG 

ii ni ii mi i n n i n n n n i in n n 

TCAAT — GATGCCA — GTCTCTGTGATAACGTGCTGGCCCCCAACGTCACCAAACAAGAGTG-CTG-CTG 


4370 

4380 

4390 

4400 

4410 4420 


1170 1180 

1190 

1200 

1210 

1220 

1230 


TGTGTGACACTCCTGATGCCTGAGGAGGTCAGAAGAGAAAGGGTTGGTT — CCA-TA-AG — AACTG — GAG 

ii i n i in m in i n i i n in i n mi in 

TArATrnnrrprr r:f:rTcocr:A-r:ACAATTr:Tr‘AfVATrTTrrrTTrrprAr.:TrrA£r:r.£ArTr;f'Tr!An 


4430 4440 4450 4460 4470 4480 4490 

1240 1250 1260 1270 1280 1290 

TTAT-GGATGGCTGTGAGCCGGNNNGATAGGT CGGGACGGAGACCTGTCTTCTTATTTTAACGTGA 

ii i iii mi i i i mi i i inn n nm i n 

TTCTCGGA — AATGTGCCCTAGAGGAAAAGGTTTTGTCCCTGCTGGAGA — ATCCTCTTACGAAACCGGTG 
4500 4510 4520 4530 4540 4550 4560 

1300 1310 1320 1330 1340 1350 1360 

CTGTATAATAAAAAAAAAATGA-TATTTC— GGGAATTGTAGAG— ATTGTCCTGACACCCTTCTAGTTAAT 

n i n in i in i i i i in nil i i i i i n nn i 

GTGAGAACTACAAAGATGCTGACGAATGCCTGCTGTTTGGAGAGGAAATCTGCAAAAAC GGTTACT 

4570 4580 4590 4600 4610 4620 

1370 1380 1390 1400 1410 1420 1430 

GATCTAAGAGGAATTGTTGATACGTAGTATACTGTATATG— TGTATGTA-TATGTATATGTATATATAAGA 

i i n i i i i n i i inn in i i n n i in i n i i 

GTTTGAACACTCAGCCTGGGTATGAATGCTACTGCA-AGGAAGGGACATACTACGATCCTGT-CAAATTACA 
4630 4640 4650 4660 4670 4680 4690 

1440 1450 1460 1470 1480 1490 

CTCTTTTACTGTCAAAGTCAACCTAGA--GTGTCTGGT-TA-CCAGGTCAATTTTATT-GGACATTTTACGT 

I 1111 I I I I II III I I I I II I I 1 II I I II III II 
GTGTTTTGATATGGATGAATGCCAAGACCCTAACAGTTGTATCGATGGCCAGTGTGTTAATACAGAGGGC-T 
4700 4710 4720 4730 4740 4750 4760 

1500 1510 1520 1530 1540 1550 1560 

CACACACACACACACACACACACACACACGTTTATACTACGTACTGTTATCGGTATTCTAC-GTCATAT-AA 

I III I III III II II I I I III I I I I I I II Ml 

CTTACAACTGCTTTTGCACCCACCCAATGGTCCTGGATGCCT-CTGAGAAGAGATGTGTGCAGCCAACTGAA 
4770 4780 4790 4800 4810 4820 4830 

1570 1580 1590 1600 1610 1620 1630 1640 

TGGGATAGGGTAAAAGGAAACCAAAGAGTGAGTGATATTAT-TGTGGAGGTGACAGACTACCCCTTCTGGGT 

I II I III III I II II I I II 1111 II II I I I I II 

TCAAAT-GAACAAATAGAAGAAACCGA-TGTCTATCAAGATCTGTGCTGG-GA-- GCATCTGAGTGAGGAGT 
4840 4850 4860 4870 4880 4890 4900 

1650 1660 1670 1680 1690 1700 1710 

ACGTAGGGACAGACCTCCT-TCGGACTGTCTAAAACTCCCCTTAGA-AGTCTCGTCAAGTTCCCGGACGAAG 

nil i n i nn i i n n i i i in n i i in i in i 

ACGT— GTGTAGCCGTCCTCTTGTA— GGCAAGCAGACGACATACACAGAGTGCTGCTGTT— TGTACGGGG 
4910 4920 4930 4940 4950 4960 4970 

1720 1730 1740 1750 1760 1770 

AGGACAGAGGAGACACAGTCCGAAAAGT TATTTTTCCGGCAAAT-CCT-TTCCCTGTTTCGTGACACT 

in n n i nn i i n n n n i in i i in n 

AGG-CATGGGGCATGCAGTGTGCTCTCTGCCCCATGAAGGACTCAGATGACTATGCCCAGCT— GTG-CA— 
4980 4990 5000 5010 5020 5030 

1780 1790 1800 1810 1820 1830 1840 

CCACCCCTTGTGGACACTTGAGTGTCATCC — TTGCGCCGGAAGGTCAGGTGGTAC — CCGT — CTGTAGG 

n in in nil i ii n n i in i inn n i n i in 

ACATCCC-TGT-GACAGGACGGCGGCGACCATATGGACGGGATGCGTTGGTGG-ACTTCAGTGAACAGTA-T 
5040 5050 5060 5070 5080 5090 5100 

1850 1860 1870 1880 1890 1900 

GGCGGGGAGACAGAGCCGCGGGGGAGCTACGAGAATCGACT — CACAGGGCGCCCCGG-GCTTC — GCAAAT 

in n mn n i i in nn i i i n n in i n i 

GGCCCAGAAACAGACCCTTACTTCA— TTC-AGGATCGCTTTCTAAACAGCTTTGAGGAGCTACAGGCTGAG 
5110 5120 5130 5140 5150 5160 5170 

1910 1920 1930 1940 1950 1960 1970 

GAAACTTTTTTAATCTCACAAGTTTCGTCCGGG — CTCGGCGGACCTATGGCGTCGATCCTTATTACCTTAT 

in ii in i in him in i n n i n ii nil 

r-A a Tr-Trr-rAT^-rTr-AA /'pr / v T»**Tr*A a a a Trr- /■‘tt-t^ta Arrr-TT/'ArrA ArrTT at 


jidu 


Ji 7U 


JCUU 


jci v 




80 - . 1990 2000 BeStAV » leC ° P ^0 2030 2040 

CCTGGC GCCAAGAT — AAAACAAC--CAAAAG-CCTTGA-CTCCGGTACTAATTCTCCCTGCCGGCCC 

li il II III I I II I I I I II II II II I I I I II II 
ACTTGCGATTGCTTTGATGGATATCATCTGGATATGGCCAAGATGACCTGTGTTGA-TGTAAATGAATGCAG 
5240 5250 5260 5270 5280 5290 5300 


2050 2060 2070 2080 2090 2100 2110 

CCGTAAGCATAACGCGGCGATCTCCACTTTAAGAAC-CTGGCCGCGTTCTGCCTGGTCTCGCTTTCGTAAAC 

i i mi in in i ii mill i n n ii i i i i i i 

CGAGCTGAATAA-TCGGATGTCT-CTCTGCAAGAACGCCAAGTGCATTAACACAGAAGGCTCCTACAAATGC 
5310 5320 5330 5340 5350 5360 5370 


2120 2130 2140 2150 2160 2170 2180 

G-GTTCTTACAAAAGTAATTAGTTCTTGCTTTCAGCCTCCAAGCTTCTGCTAGTCTATGGCAGCATCAAGGC 

i ii iii i ii ii i i ii ii i mi ii iii ii ii ii n i i 

GTGTGTCTACCAGGCTACGTA — CCAT-CTGACAAGC-CCAA-CTACTG-TACACCACTG-AACACC — GOT 
5380 5390 5400 5410 5420 5430 

2190 2200 2210 2220 2230 2240 

TGGTATTT — GCTACGGC — TGACC-GCTACGCCGCCGCAATAAGGGTACTGGGCGGCCCGTCGA — AGGCC 

i i mi i i i i inn i i i i i ii n nil 1 1 n 

TTGAATTTAGACAAAGACAGTGACCTGGAGTGAAGGAGAAGCTACGTAAC— CTATGCCCATATACTCTGCA 
5440 5450 5460 5470 5480 5490 5500 

2250 2260 2270 2280 2290 2300 2310 

CTTTGGTTTCAG — AAACCCA — AGGCCCCCCTCATACCAACGTTTCGACT-TTGATTCTTGCCGGTACG- 

II 11 1 II III I III I 1 I I II I I I 1 II I II II 

CTGTG — TAAAGGAAAAGGG'AGAGAGGTATACTTGAGA-CACTGCACCTAATCCAGACCATGGCAAAGAAGG 
5510 5520 5530 5540 5550 5560 5570 

2320 2330 2340 X 

— TGGTGGTGG-GT-GCCTTAGCTCTTTCTCGATAGTTAGAC 

nn n n ii i i n in n 

AAACAACGTGGAGTTGCGTGAAC-CCCCAAAGAAAGTGAGCGGATGGAGTAGCAGCCTGAGAGGTGCGACAG 
5580 5590 5600 5610 5620 5630 5640 


ACCAAATGGACATTTCCTCA 
5650 5660 


E 500- 
Q 


4 : : 6 

o 1 


PARAMETERS 


Similarity matrix Unitary 
Mismatch penalty 1 
Gap penalty 1. 00 
Gap s i ze pena 1 ty 0. 05 
Cutoff score 5 
Random i zat i on group 0 

Initial scores to save 20 
Optimized scores to save 20 


K-tuple 

Joining penalty 
Window size 


A1 ignments to save 
Display context 


Scores i 


T i mes s 


SEARCH STATISTICS 

Mean Med i an 

5 7 

CPU 

00:02: 34. 02 


Standard Deviation 
1. 84 

Total Elapsed 
00:07: 35. OO 


Number of residues: 4627393 

Number of sequences searched: 16524 

Number of scores above cutoff: 4313 

Cut-off raised to 6. 

Cut-off raised to 7. 

Cut-off raised to 8. 


The scores below are sorted by initial score. 
Significance is calculated based on initial score. 


A 100% identical sequence to the query sequence was not found. 


The list of best scores is 5 


Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 




**** 6 standard deviations above mean 

#*** 




1 . 

A2738 1 

Complement subcomponent Cls pr 

688 

17 

43 

6. 51 

0 



**** 5 standard deviations above mean 

**** 




2. 

SO 1292 

Tenascin - Chicken (fragment) 

697 

16 

41 

5. 97 

0 

3. 

SO 1 845 

DNA ( cyt os ine-5-> -methyl transf 

1573 

16 

26 

5. 97 

0 

4. 

VWPBD 

Coat protein VP1 - Budgerigar 

343 

15 

30 

5. 43 

0 

5. 

PS0047 

Extracellular serine protease 

448 

15 

44 

5. 43 

0 

6. 

A27733 

nifA protein - Azotobacter vin 

129 

15 

25 

5. 43 

0 

7. 

SO 1927 

Regulatory protein nifA - Azot 

522 

15 

38 

5. 43 

0 

8. 

W6WLHS 

Probab 1 e E6 prote i n - Pap i 1 1 om 

158 

15 

28 

5. 43 

0 

9. 

S04029 

Sodium channel protein - Fruit 

1321 

15 

35 

5. 43 

0 

10 . 

D3 1 090 

Hydrogen i on-t ranspor t i ng ATP 

163 

15 

22 

5. 43 

0 



**** 4 standard deviations above mean 

**** 




1 1 . 

MNXRW4 

Nonstructural protein Pns4 - W 

732 

14 

44 

4. 89 

0 

12 . 

ZLVN 

L protein - Vesicular stomatit 

2109 

14 

43 

4. 89 

0 

13. 

B28392 

Penicillin amidase I precursor 

558 

14 

41 

4. 89 

0 

14. 

DEECDA 

Aspartate-semi aldehyde dehydro 

367 

14 

25 

4. 89 

0 

15. 

WMBEH6 

UL36 protein - Herpes simplex 

3164 

14 

36 

4. 89 

0 

16. 

SOI 165 

Achaete— scute locus protein T3 

257 

14 

37 

4. 89 

O 

17. 

KXBOZ 

Protein Z - Bovine 

396 

14 

34 

4. 89 

0 

18. 

VCLJB 

env polyprotein - Bovine leuke 

515 

14 

44 

4. 89 

0 

19. 

S06053 

Transforming protein (ski) - H 

728 

14 

40 

4. 89 

0 

20. 

0QBE6L 

Hypothetical BXLF2 protein - E 

706 

13 

47 

4. 34 

0 


The scores below are sorted by optimized score. 

Significance is calculated based on optimized score. 

A 100% identical sequence to the query sequence was not found. 


The list of best scores is* 

Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 


**** 5 standard deviations above mean **** 


1. 

VGVUPT 

Glycoprotein precursor - Punta 

1313 

9 

51 

5. 18 

0 

2. 

EGMSMG 

Epidermal growth factor precur 

1217 

9 

51 

5. 18 

0 



**** 4 standard deviations above mean 

**** 




3. 

TVRTNU 

Kinase-related transforming pr 

1260 

9 

50 

4. 71 

0 

4. 

A30359 

Granule membrane protein 140 p 

830 

7 

50 

4. 71 

0 

5. 

AHRB 

Ig alpha chain C region - Rabb 

299 

9 

50 

4. 71 

0 

6. 

K0HUP 

Plasma kallikrein precursor - 

638 

8 

50 

4. 71 

0 

7. 

G0HUN 

Nerve growth factor receptor p 

427 

8 

49 

4. 24 

0 

8. 

0RHULD 

LDL receptor precursor - Human 

860 

7 

49 

4. 24 

0 

9. 

W2WLB2 

Probable E2 protein — Bovine p 

422 

7 

49 

4. 24 

0 

10. 

JL0104 

Lymphocyte— associated cell sur 

385 

3 

49 

4. 24 

0 

1 1. 

A26850 

Hydrogen ion-transporting ATP 

489 

IO 

49 

4. 24 

0 

12. 

VHWVB 

Structural polyprotein - Sindb 

1245 

13 

49 

4. 24 

0 

13. 

S06028 

Gene supressor-of-whi te-apr ico 

964 

7 

49 

4. 24 

0 



**** 3 standard deviations above mean 

**** 




14. 

A28455 

Cell surface antigen 4F2 heavy 

529 

8 

48 

3. 77 

0 

15. 

A32375 

Lymphocyte surface MEL- 14 anti 

372 

8 

48 

3. 77 

0 

16. 

GNVUUK 

Glycoprotein precursor - Uukun 

1008 

1 1 

48 

3. 77 

0 

17. 

SYECCP 

Carbamoy 1 —phosphate synthase ( 

1072 

8 

48 

3. 77 

0 

18. 

MHMS 

Ig mu chain C region - Mouse 

455 

8 

48 

3. 77 

0 

19. 

UIBO 

Thy rog 1 obu 1 i n precursor - Bovi 

2769 

8 

48 

3. 77 

0 

20. 

A24976 

Ig mu chain C region, b allele 

455 

8 

48 

3. 77 

O 


1. ELLIS-267-3 A 

VGVUPT Glycoprotein precursor - Punta Toro virus 


ENTRY 

TITLE 

INCLUDES 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

HOST 

REFERENCE 
ttAuthors 
tt Journal 


VGVUPT ttType Protein 

Glycoprotein precursor — Punta Toro virus 
glycoprotein NS-M\ glycoprotein Gl\ glycoprotein G2 
27— Nov— 1 985 ttSequence 27-Nov-1985 ttText 31 -Dec- 1989 
1707.0 1.0 1.0 1.0 1.0 

Punta Toro virus 
A04 1 09 

ttCommon-name mosquitos 

Homo sapiens ttCommon— name man 

(Sequence translated from the RNA sequence) 

Ihara T. , Smith J. , Dalrymple J. M. . Bishop D. H. L. 
Virology (1985) 1 44 » 246-259 


COMMENT 
SUPERFAMILY 
KEYWORDS 
FEATURE 
1-270 
271-809 
810-1313 
76, 1021 , 
SUMMARY 
SEQUENCE 


This virus is a member o-f the family Bunyav i r i dae. 
ttName phlebovirus glycoprotein 
glycoproteins transmembrane protein 

ttProtein glycoprotein SN-M (SNM>\ 
ttProtein glycoprotein G1 (GG1>\ 
ttProtein glycoprotein G2 <GG2>\ 

1243 ttBinding-si te carbohydrate (possible) 

ttMo 1 ecu 1 ar-we i ght 146374 ttLength 1313 ^Checksum 4967 


Initial Score 
Res i due I dent i t y 
Gaps 


9 Optimized Score = 51 Significance = 5. 18 

23% Matches = 68 Mismatches = 172 

53 Conservative Substitutions = 0 


X lO 20 30 40 50 

MGNNCYNVVV I VLLLVGCEK VGAVQNSCDNCQPG — TFCRKY NPVCKSCPPSTFS 

I > I I III I I I I I III 

’ > 'I ill i i i » i ill 


TNVSFVCYEHVGQDEQEVEHRALKRVSVNDCK I VDNSKQK I CTGDHVFCEK YDCSTSYPDVTC I HAPGSGPL 
500 X 510 520 530 540 550 560 


60 70 80 90 lOO HO 

S I GGQPNCN I CRVCAGY FRFKKFCSSTHNAEC EC- I EGFHCLGPQCTRCEKDCRPGQELT — K 

1 till II i i i I l i ill 

* i i i i it i i i i i i lit 

Y I — NLMGSW I KPQCVGYERVLVDREVKQPLLAPEQNCDTCVSECLDEGVH 1 KSTGFE I TSA 

570 580 590 600 6 1 0 620 


120 130 

QGCK T CSLGTFNDQNGT- 


140 150 160 

-GVCRPWTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSP — 


170 180 

•STT I SVTPEGGPGG 


VACSHGSC I SAHQEPSTSV I VPYPGLLASVGGR I G I HLSHT— 
630 640 650 660 


SDSASVHMVVVCPPRDSCAAHNCLLCYHG I 
670 680 690 


190 200 210 220 230 240 

HSLQ— VLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTG AAQEEDACSC 

• it i i i i i ii til i i i l l i i 


LNYQCHSTLSA I LTSFLL — I LF I YTVFSVTTN I LYVLRL I PKQ— LKSPVGWLKLF I NWLLTALR I KTRNVM 
700 710 720 730 740 750 760 


250 X 

RCPQEEEGGGGGYEL 

i i 

RR I NQR I GWVDHHDVERPRHREPMR 
770 780 790 


2. ELLIS-267-3 A 

EGMSMG Epidermal growth factor precursor - Mouse 


ENTRY 


EGMSMG 


ttType Protein 



I i. | 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

^Authors 

# Journal 
ttComment 

^Comment 


REFERENCE 

ttAuthors 
# Journal 
^Comment 


ttComment 


REFERENCE 

^Authors 
# Journal 
ttComment 

REFERENCE 
ttAuthors 
tt Journal 
ttComment 

COMMENT 


SUPERFAMILY 

SUMMARY 

SEQUENCE 


t.pnjoi nidi yiuw^ri TdLtui pi ecui SUI — HUUSe 

30-Now-^^^j!!^^^^© 11 -Aug- 1983 ttText 31 -Dec- 1989 
575.0 1.0 r/o 1.0 1.0 

Mus musculus ttCommon-name house mouse 
AO 1387 

(Sequence translated from the mRNA sequence) 

Scott J. , Urdea M. , Quiroga M. , Sanchez-Pescador R. , 
Fong N. , Selby M. , Rutter W. J. » Bell G. I. 

Science (1983) 221=236-240 

The cleavage site tor the signal sequence is not 
known. 

The precursor sequence contains seven regions that 
are similar to the epidermal growth -factor 
sequence: residues 357-399* 400-440. 441-480. 
745-784, 832-885, 886-925, and 926-976. 

(Sequence of residues 1-1168 translated from the 
mRNA sequence) 

Gray A. , Dull T. , Ullrich A. 

Nature (1983) 303=722-725 

This sequence differs from residues 1-1133 of that 
shown in having 790-Tyr and 1048-Ser. It differs 
greatly from residues 1134-1168 of that shown due 
to an insertion of one base in the nucleotide 
sequence with respect to the nucleotide sequence 
of Scott, et al. , which causes a shift in the 
read i ng f rame. 

There are sequence homologies between residues 
32 1 -360 , 36 1 -40 1 , 402-442 , 443-482 , 746-786 , 
837-875, 876-917, 918-958, and 978-1018. 

(Active protein, complete sequence of residues 
977-1029 with experimental details) 

Savage Jr C. R. , Inagami T. , Cohen S. 

J. Biol. Chem. ( 1972 ) 247 = 7612-7621 
Residues 1024-1029 are not required for full 
biological activity in vivo. 

(Disulfide bonds) 

Savage Jr C. R. , Hash J. H. , Cohen S. 

J. Biol. Chem. ( 1973) 248 = 7669-7672 

Disulfide bonds link residues 982-996, 990-1007, and 
1009-1018. 

The active growth factor from this submaxillary 
gland protein stimulates the growth of various 
epidermal and epithelial tissues in vivo and in 
vitro and of some fibroblasts in cell culture. 
ttName epidermal growth factor 
ttMo 1 ecu 1 ar— we i ght 133143 ttLength 1217 ^Checksum 9280 


Initial Score 
Residue Identity 
Gaps 


9 Optimized Score = 51 Significance = 

23% Matches = 69 Mismatches = 

60 Conservative Substitutions = 


5. 18 
162 
0 


lO 


20 


30 


40 


50 


60 


MGNNCYNVVV I VLLLVGCEK VGAVQNSCDNCQPGTFCRK YNPVCKSCPPSTFSS I GGQPNCN 


SCFD I DECQRGAHNCAENAACTNTEGGYNCTCAGRPS SPGRSC PDST APSLLGEDGHHLDRN 

920 X 930 940 950 960 970 


70 80 90 100 1 lO 

I CRVC AGYFRFKKFC SSTHNAECEC I EGFHCLGP9CTRCEKDCR PGQELTKQGCKTC 

* • • < • ill i i i i i it i 

' * * ■ i til i t t i i it , 

SYPGCPSSYDGYCLNGGVCMH I ESLDSYTCNCV I GYSGDRCQ— TR DLRWWELRHAGYGQKHD I MWAVC 

980 990 1 OOO 1 0 1 O 1 020 1 030 1 040 


120 130 140 150 160 170 

SLGTFNDQNGTGVCR PWTN-CSLDGRSVLKTGTTEKD WCGPPVVSFSPSTT I SVTPEGGP 


MVALVLLLLLGMWGTYYYRTRKBLSNPPhgSPCDEPSGSVSSSG PDSSSGAAVASCPQPWFWLEKHQDP 

1050 1060 Best l$ v ?$ ab,e Co P?080 1090 1100 1110 


180 190 200 2 1 0 220 230 240 

GGHSLQVLTLFLALTSALLLAL I F I TLLFSV— LKW I RKKFPH I FKQPFKKTTGAAQEEDACSCRCPQEEEG- 

11 1 1 > i i t < i i i i i ii i i | | | | | 

' 1 ' • i i i i i i i t i i ii i l i i t t i 

KNGSLPADGTNGAVVDA GLSPSLQLGSVHLTSWRQK-PH I DGMGTGQSCW I PPSSDRGPQE I EGN 

1120 1130 1140 1150 1160 1170 


250 X 

GGGGYEL 

i i 
i i 

SHLPSYRPVGPEKLHSLQSANGS 
1160 1190 1200 


3. 


ELLIS-267-3A 

TVRTNU Kinase-related transforming protein precursor (neu 


ENTRY 

TITLE 

DATE 

PLACEMENT 
SOURCE 
ACCESSION 
REFERENCE 
tt Authors 
tt Journal 
ttTitle 

GENETIC 


TVRTNU ttType Protein 

Kinase-related transforming protein precursor (neu) 
- Rat ttEC-number 2. 7. 1. - 

31 -Dec- 1988 ttSequence 31 -Dec- 1988 ttText 31 -Dec- 1988 
197.0 15.0 2.0 1.0 2.0 

Rattus norvegicus ttCommon-name Norway rat 
A24562 

(Sequence translated from the mRNA sequence) 
Bargmann C. I. . Hung M. C. , Weinberg R. A. 

Nature (1986) 319*226-230 

The neu oncogene encodes an epidermal growth factor 
receptor— related protein. 


ttName 

SUPERFAMILY 

KEYWORDS 

FEATURE 
1-19 
20- 1 260 

658-680 

731-986 


neu 

ttName kinase— related transforming protein 
transforming proteinX tyrosine-specific protein 
k i nase 

ttDomain signal sequence <SIG>\ 
ttProtein kinase— related transforming 
protein neu (KTP>\ 
ttDomain transmembrane (TMN>\ 
ttDomain tyrosine-specific protein kinase 
< TPK )\ 


71 , 191 ,263,535,576, 


634,763,1146,1231 #B i nd i ng-s i te carbohydrate (Asn) 

( possible)\ 

691,882,1227,1253 #Mod i f i ed— s i te phosphorylation 

SUMMARY #Mo 1 ecu 1 ar-we i ght 139219 ttLength 1260 ttChecksum 5917 

SEQUENCE 


Initial 

Residue 

Gaps 

Score = 

Identity = 

9 

23% 

67 

Optimized Score = 50 

Matches = 69 

Conservative Substitutions 

S i gn i f i cance = 
Mismatches = 

4. 71 
158 
0 


X 

io 

20 

30 

40 


MGNNCYNVVV I V LLLVG CEK VGAVQNSCDNCQPGTFCRK YNPV 


RELGSGLAL I HRNAHLCFVHTVPWDQLFRNPHQALLHSGNRPEEDLCVSSGLVCNS — LCAHGHCWGPGPTQ 
470 X 480 490 500 510 520 530 


50 60 70 80 90 1 OO 

CKSCPPSTFSS I GGQPNCN I CRVCAGYFR FKKFCSSTHNAECEC I EGFHCLG PQCTRC — EKDCRP 


CVNC — SHF — LRGQECVEECRVWKGLPREYVSDKRCLPCHPECQPQNSSETCFGSEADQCAACAHYKDSSS 
540 550 560 570 580 590 600 



110 120 
GQELTKQGCK — TCSLGTI 


30 


140 150 1 GO 170 

r NCSLDGRSVLKTGTTEKDWCGPPWSFSPSTT I SVTPEG 

• i I t i i i i i i 

i > i i i i i i t i 


CVARCPSGVKPDLSYMP I WKYPDEEG I CQPCP I NCTHSCVDL 
G10 620 G30 G40 


•DERGCPAEORASPVTFI I ATVEG 
650 GGO 


180 190 200 2 1 0 220 230 240 

GPGGHSLGVLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FK0 PFKKTTGAAQEEDACSCRCPQ 

* * ill i ii ill I it il 

• • ill i il ill i il ii 


VL-LFL I L-VVVVG I L I • 
G70 680 


KRRRQK I RKYTMRRLLOETELVEPLTPSGAMPNQA— OMR I LK 


690 


700 


710 


720 


250 X 

EEE GGGGGYEL 

It li 

il il 

ETELRK VK VLGSGAFGTVYKG I W I PD 
730 740 


4. ELLIS-267-3A 

A30359 Granule membrane protein 140 precursor - Human 


ENTRY 
TITLE 
SOURCE 
ACCESSION 
REFERENCE 
ttAuthors 
# Journal 
ttTitle 


A30359 ttType Protein 

Granule membrane protein 140 precursor — Human 

Homo sap i ens ttCommon-name man 

A30359 

(Sequence translated -from the mRNA sequence) 

Johnston G. I. , Cook R. G. , McEver R. P. 

Cel 1 ( 1989) 56 s 1033-1044 

Cloning of GMP-140, a granule membrane protein of 
platelets and endotheliums sequence similarity to 
prote i ns i nvo 1 ved in cell adhes i on and 
inflammation. 


FEATURE 

1-41 

42-830 


ttDomain signal sequence <SIG)\ 
ttProtein granule membrane protein 140 
< MAT ) \ 


42-159 

160-199 

772-795 

54,98, 180,212,219,41 1 

460,518,665,716,723, 

741 


ttDomain lectin <LEC>\ 

ttDomain EGF <EGF)\ 

ttDoma i n t r ansmembr ane < TMN ) \ 


200-770 

COMMENT 

SUMMARY 

SE0UENCE 


ttB i nd i ng— s i te carbohydrate (Asn>\ 
ttDomain complement H/C4b-b i nd i ng (COM) 
THIS SE0UENCE HAS NOT BEEN COMPARED TO THE 
NUCLEOTIDE TRANSLATION. 

ttMo 1 ecu 1 ar-we i ght 90766 ttLength 830 ttChecksum 2552 


Initial Score 
Residue Identity 
Gaps 


7 Opt i m i zed Score = 50 S i gn i f i cance = 4. 7 1 

22% Matches = 66 Mismatches = 174 

47 Conservative Substitutions = 0 


10 


20 


30 


40 


50 


60 


MGNNC YN W V I VLLL VGCEK VG A V0NSCDNCOPGTFCRK YNPVCK SCPPSTFSS I GG0PNCN 


NEARVNCSHPFGAFRYOSVCSFTCNEGLLL VGA- 
460 X 470 480 


-SVL0CLATGNWNSVPPEC0A I PCTPLLS — P0NGTM 


490 


500 


il 0 


520 


70 80 90 

I CRVCAGYFRFKKFCSSTHNAECEC I EGFHCLGP- 


100 HO 

-0CTR — CEK DCRPG0ELTK 


120 

0GCKTCS— 


•OF I CDEGYSLSGPERLDCTRSGRWTDSPPMCEA I KCPELFAPEOGSLDCSD 
540 550 560 570 580 


TCVOPLGSSSYKS I C 
530 



-LGTFN- 


130 ^140 T50 ibO 170 180 

D Q N G'ggy{^^ 1 f e |^C(^ f ^G R SV L K T G TTE K D W C GPPWS F SPS TT I SVTPEGGPGGH 

■ ■ ’ 11 » III til I • l i i l 

1 ‘ * 11 • iti III i >l ill 

TRGEFNVGSTCHFSCNNGFKLEGPNNVECTTSGR— WSATPPTCKG I ASLPTPGLQCPALT TPGGGTMYC 

590 600 610 620 630 640 650 


1 90 200 2 1 0 220 230 240 

SLQVLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKQP FKKTTGAAQEEDA — CSCRCPQEEE 

•il ii i , • iti 

' 1 * 'I * i l III 

RHHPGT— FGFNTTCYFGCNAGFTL I GDSTLSCRPSGQWTAVTPACRAVKCSELHVNKP I AMNCSNLWGNFS Y 
660 670 680 690 700 710 720 


250 X 

G— GGGGYEL 

i i 1 

I I t 

GS I CSGHCLEGQLLNGSAQ 
730 X 740 


5. ELLIS-267-3 A 

AHRB 


Ig alpha chain C 


region - Rabbit 


( fragment ) 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

^Authors 

tt Journal 
COMMENT 


SUPERFAMILY 

KEYWORDS 

SUMMARY 

SEQUENCE 


AHRB ttType Protein (fragment) 

Ig alpha chain C region - Rabbit (fragment) 

28— Aug— 1 985 ^Sequence 28-Aug-1985 ttText 30-Jun-1989 
696. O 13. O 3. 0 1.0 1.0 

Oryctolagus cun i cuius #Common-name domestic rabbit 
A02174 

(Sequence translated from the mRNA sequence) 

Knight K. L. , Martens C. L. . Stoklosa C. M. , 
Schneiderman R. D. 

Nuc 1 e i c Acids Res. ( 1 984 > 12:1 657- 1 670 
This immunoglobulin belongs to the IgA-g subclass. 
It was isolated from a rabbit homozygous for a2. 
n80» del 2 i 15 > f71 , g75 heavy chain haplotype. 
ttName i mmunog 1 obu 1 i n C reg i on 
immunoglobul in\ plasma protein 

ttLength 299 ttChecksum 2361 


Initial 
Res i due 
Gaps 

Score = 

Identity = 

9 

23% 

70 

X 

10 

20 


Optimized Score = 50 

Matches = 69 

Conservative Substitutions 


S i gn i f i cance 
Mismatches 


30 40 50 60 

MGNNCYNWV I VLLLVGCEK VGAVQNSCDNCQPGTFCRK YNPVCKSCPPSTFSS I GGQPNC- 


4. 71 
153 
0 


— NICRVCA 


QSGTSGPYT ACSEL I LPVTQCLG — QKS-AAC HVEYNSV I NESLPVPF PDCCPANSCCTC- 

X 10 20 30 40 50 


70 80 90 100 llO 120 1 30 

GYFRFKKFCSSTHNAECEC I EGFHCLGPQCTRCEKDCRPGQELTKQGCK — TCSLGTFNDQNGTGVCRPWTN 

1 1 * * i ill lit 


-PSSSSRNLISGCGPSLSLGRPDLGDLLLGRDASLTCTLSGLKNPEDAVFTWEPTNGNEPVGQRAQ 
SO 70 80 90 100 110 1 20 


140 150 160 170 180 190 

CSLDG RSVL KTGTTEKDVVCGPPVVSFSPSTT I S VTPEGGPGGHSLQVLTLFLALTSA 


RDLSGCYSVSSVLPSSAETWK ARTEFTCT VTHPE I DSGSLT AT I SRGVVTP PQVHLLPPPSEELALNEQ 

130 140 150 160 170 180 190 


200 2 1 0 220 230 240 250 

LLLAL I F I TLL FS VLKW I R KKFPH I FKQP FKKTTGAAQEEDACSCRCPQEEEGGG 


-TCLVRGFSPKDVLVSWRHQGQEVPEDSFLVWKSMPESSQDKATYA I TSLLRVPAEDWNQG 


VTL 


X 

GGYEL 


DTYSCMVGHEGLAEH 

260 


6. ELLIS-267— 3 A 

KQHUP Plasma kalllkreln precursor - Human #EC-number 


ENTRY 

TITLE 

ALTERNATE-NAME 

DATE 

PLACEMENT 
SOURCE 
ACCESSION 
REFERENCE 
^Authors 
# Journal 
COMMENT 


COMMENT 


COMMENT 


SUPERFAMILY 

KEYWORDS 


FEATURE 
1-19 
20-390.391-1 

389-621 

20-104, 1 lO- 

200-284,291 

434 

483 

578 

127,308,396 
SUMMARY 
SEQUENCE 


KQHUP ttType Protein 

Plasma kallikrein precursor - Human ttEC-number 
3. 4. 21. 34 

p 1 asma preka 1 1 i kre i n\ k i n i nogen i n 

1 3— Aug— 1 986 ^Sequence 13-Aug-1986 #Text 13-Aug-1986 
356. 0 4. 0 2. 0 1.0 1.0 

Homo sapiens #Common-name man 

A00921 

(Sequence translated from the mRNA sequence) 

Chung D. W. , Fuj ikawa K. , McMullen B. A. , Davie E. W. 

Biochemistry ( 1986) 25*2410-2417 

This protein, synthesized in the liver, circulates 
as a noncovalent complex with high molecular 
weight (HMW) kininogen. 

The zymogen is activated by -factor XI la, which 
cleaves the molecule into a light chain, which 
contains the active site, and a heavy chain, which 
associates with HMW kininogen. These chains are 
1 i nked by one or more disulfide bonds. 

The enzyme cleaves Lys-Arg and Arg-Ser bonds. It 
activates, in a reciprocal reaction, factor XII 
after its binding to a negatively charged surface. 
It also releases bradykinin from HMW kininogen and 
may also play a role in the renin-angiotensin 

• system by converting prorenin into renin. 

389—62 1 ttName t r yps i n 

hydro lase\ serine proteinase\ glycoprotein\ plasma\ 
blood coagulation\ f ibrinolysis\ inf lammat ion\ 

1 i ver\ dup 1 i cat i on 


ttDomain signal sequence <SIG)\ 
ttProtein plasma kallikrein, heavy and 
light chains <MPT)\ 

#Domain (or 383-625) serine proteinase 
( TRY >\ 

#Dup 1 i cat i on\ 

#Active-site His\ 

#Act i ve-s i te Asp\ 
ttActive-site Ser\ 

#Binding-si te carbohydrate (Asn) 

-weight 71369 ttLength 638 ^Checksum 585 


194, 

-375 


, 453 , 494 
ttMo 1 ecu 1 ar 


Initial Score 
Residue Identity 
Gaps 


8 Optimized Score = 50 

23% Matches = 70 

69 Conservative Substitutions 


S i gn i f i cance 
Mismatches 


4. 71 
164 
0 


X 

MGNNC- 


lO 20 30 40 50 

-YNVVV I VLLLVGCEK VGAVQNSCDNCQPGTFCRK YNPVCKSCPPST 


DAFVCRT I CTYHPNCLFFTFYTNVWK I ESQRNVCLLK TSE— SGTPSSS — TPQENT I SGYSLLT CKRTLPEP 
230 X 240 250 260 270 280 290 


FSS 


I GGQPNCN I CRVCAGYFRFKK---FC< 
. . Best Available Copy 


--FCSSTHN AECEC I EGFHCLGPQCTRCEKDCRPGQELTKQGCK 


CHSK I YPGVDFGGEELNV- 
300 


TFVKGVNVCQETCTKM I RCBFFTYSLLPEDCKEEK-CKCFLRLSMDGSP 

3 1 O 320 330 340 350 


120 130 140 150 1 GO 170 

— CSLGTFNDQNGTG VCRPWTNCSLDGRSVLKTGTTEKD VVCGPPV VSFSPSTT I SVTPEGGP 


TR I AYGTQGSSGYSLRLCNTGDNSVCTTKT— 
3G0 370 380 


-STR I V GGTNSSWGEWPWQVSLQVKLT AQRHLCGGS 

390 400 410 420 


180 190 200 210 220 230 

GGHSLQVLT LFL ALTSALL-LAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTG A AQE 

* * • i i i i tilt ill i i 

* 1 1 11 till i i i i ill i i 

L I GHQWVLT AAHCFDGLPL0DVWR I YSG I LNLSD I TKDTPFSQ I KE I 1 1 H0NYK VSEGNHD I AL I K 

430 440 450 4G0 470 480 

240 250 X 

EDA CSCRCPQEEEGGGGGYEL 

l i I 

i i i 

LQAPLNYTEFQKP I CLPSKGDTST I YTNCWVTGWG 
490 500 510 X 520 


7. ELLIS-267-3A 

GQHUN Nerve growth -factor receptor precursor - Human 


ENTRY 

TITLE 

ALTERNATE-NAME 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

©Authors 

©Journal 
#T i t 1 e 
COMMENT 


COMMENT 


COMMENT 


COMMENT 

GENETIC 


GQHUN ©Type Protein 

Nerve growth -factor receptor precursor - Human 

NGF receptor 

31 -Mar- 1988 ©Sequence 31 -Mar- 1988 ©Text 31 -Mar- 1988 
580. 0 1.0 1.0 1.0 1.0 

Homo sapiens #Common-name man 

A25218 

(Sequence translated -from the mRNA sequence) 

Johnson D. . Lanahan A. » Buck C. R. . Sehgal A. . Morgan 
C. . Mercer E. . Bothwel 1 M. . Chao M. 

Cell ( 198G) 47 * 545—554 

Expression and structure of the human NGF receptor. 

This receptor is found on sensory and sympathetic 
neurons, on neuroblastoma cells, and on a variety 
of nonneuronal derivatives of the neural crest. 

The duplicated cysteine-rich region of the 

extracellular domain may form part or all of the 
NGF-binding site. The active form of NGF is a 
noncovalent dimer of identical chains. 

Although structurally similar, this receptor differs 
from other growth factor receptors in that its 
cytoplasmic domain is not homologous to known 
tyrosine or serine/threonine protein kinases. 
Although apparently lacking intrinsic kinase 
activity, it is phosphory lated on serine. 

This recepter undergoes both N- and 0-1 inked 
g 1 ycosy 1 at i on. 


©Map-pos i t i on 1 7q2 1 -q22 
©Name NGFR 


SUPERFAMILY 

KEYWORDS 

FEATURE 

1-28 

29-427 


©Name nerve growth factor receptor 

receptorN integral membrane protein\ glycoprotein\ 
dupl i cat ion 

©Domain signal sequence <SIG>\ 

©Protein nerve growth factor receptor 
< MAT > \ 

©Domain extracellular <EXT>\ 


29-250 


C.vJ X VW 7 X Vw; X WV' 


29- 1 90 
1 97-248 
251-272 
273-427 
60 

SUMMARY 

SEQUENCE 


x x u x i i \ 

r. a ©Regfpn cysteine-r ich\ 
es v ^ifeeg reiser ine/threonine-rich\ 

©Domain transmembrane <MEM>\ 

©Domain cytoplasmic <CYT>\ 

©Binding-site carbohydrate (Asm) 

( putative) 

©Molecular-weight 45183 ©Length 427 ©Checksum 7426 


Initial 

Score = 

8 

Res i due 

Identity = 

22% 

Gaps 

= 

54 


X 

io 


49 S i gn i f i cance 
66 Mismatches 


20 

MGNNCYNWV I VLLLVGCEK VGAVQ- 


30 

-NSCDNCQPG- 


40 


4. 24 
173 
0 


50 


-TFCRKYNPVCKSCPPS 


PCTECVGLQSMSAPC VEADDAVCRCAYGYYQDETTGRCEACRVCEAGSGLVFSCQDKQNTVCEECPDG 

90 X 100 110 120 130 140 150 

60 70 80 90 100 110 

TFS-S I GGQPNCN I CRVCAG YFRFKKFCSSTHNAECEC I EG FHCLGPQCT RCEKDCRPGQEL— 


TYSDEANHVDPCLPCTVCEDTERQLRECTRWADAECEE I PGRW I TRSTPPEGSDST APSTQEPEAPPEQDL I 


160 


170 


180 


190 


200 


210 


220 


120 


130 


140 


150 


160 


170 


180 


— TKQGCKTCSLGTFNDQNGTGVCRPWT— NCSLDGRSVLKTGTTEKDWCGPPWSFSPSTTISVTPEGGPG 


ASTVAGWT TVMGSSQP WTRGTTDNL I PVYCS I L AAWVG— LVAY I AFKRWNS— CKQNKQG 


230 


240 


250 


260 


270 


280 


190 


200 


210 


220 


230 


240 


GHSLQVLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTG AAQEE DACSC 


ANSRPV— NQTPPPEGEKLHSDSG I SVDSQSLHDQQPHTQT ASGQALKGDGGLYSSLPPAKREEVEKLLNGSA 


290 


300 


310 


320 


330 


340 


350 


250 X 

RCPQEEEGGGGGYEL 


GDTWRHLAGELGYQPEH I DSFTHEA 
360 370 380 


8. ELLIS-267-3 A 

QRHULD LDL receptor precursor - Human 


ENTRY 

TITLE 

DATE 

PLACEMENT 

SOURCE 

ACCESSION 

REFERENCE 

©Authors 

©Journal 

COMMENT 


QRHULD ©Type Protein 

LDL receptor precursor - Human 

17-May- 1985 ©Sequence 17-May- 1985 ©Text 28-May- 1986 
574.0 1.0 1.0 1.0 1.0 

Homo sapiens ©Common— name man 

AO 1383 

(Sequence translated from the mRNA sequence) 

Yamamoto T. , Davis C. G. , Brown M. S. . Schneider W. J. , 
Casey M. L. , Goldstein J. L. i Russell D. W. 

Cell (1984) 39 • 27—38 

This transmembrane glycoprotein binds LDL. the major 
cholesterol— carry ing lipoprotein of human plasma, 
and transports it into cells by endocytosis. In 
order to be internalized, the receptor— 1 i gand 
complexes must first cluster into cl at hr in-coated 
pits. 

The amino end of the extracellular domain contains 
seven or eight 40-residue repeats. Each repeat has 


COMMENT 


COMMENT 


COMMENT 


SUPERFAMILY 

KEYWORDS 


involved in disulfide bonds. Following these 
rep^%V^M^ a fel e r^ypyon of about 350 residues that is 
homologous with part of the epidermal growth 
factor ( EGF ) precursor. 

The last half of the extracellular domain contains 
structural evidence of repetitive sequence in the 
similarity of residues 441-445. 488-492. 531-535, 
575-579, and 617-621. 

An intrastrand recombination event between two Alu 
sequences in the 3’ untranslated region of mRNA 
from a familial hypercholesterolemia patient 
results in the deletion of the transmembrane and 
cytoplasmic domains. Most of the receptors 
produced are secreted, but those that adhere to 
the cell surface cannot cluster in coated pits; 
therefore, even though they bind LDL , these 
receptor- 1 igand complexes are not internalized. 

#Name LDL receptor 

glycoproteinX LDL\ cholesterolX lipid transportX 
endocytosisX coated pits\ transmembrane proteinX 
receptor 


FEATURE 

22-860 

1-21 

22-788 

22-61 ,62-102, 103-141 , 
142-180, 191-229, 
230-268 , 269-309 
31 1-661 
721-768 


ttProtein LDL receptor <MAT>\ 
ttDomain signal sequence <SIG>\ 
^Domain extracellular <EX1>\ 


#Dup 1 i cat i on\ 

^Region EGF precursor homologyX 
^Region clustered 0-1 inked 
ol igosacchar idesX 


789-810 
81 1-860 
SUMMARY 
SEQUENCE 


ttDomain transmembrane <TMM>\ 
ttDomain cytoplasmic <CYT> 
ttMolecular-weight 95375 ^Length 860 ^Checksum 


3641 


Initial Score 
Residue Identity 
Gaps 


7 Opt i m i zed Score = 49 S i gn i f i cance = 4. 24 

23% Matches = 67 Mismatches == 162 

56 Conservative Substitutions = 0 


X 10 20 

MGNNCYN WV I VLLLVGCEK VGAVQNSC— 


30 40 50 60 

-DNCQPGTFCRK YNP VCKSCPPSTFSS I GGQPNCN I CRVCA 


MGPWGWKLRWTVALL— LAAAGT AVGDRCERNEF0C0DG— KC I SYKWVCDGSAEC0DGSDES0ETCLSVTCKS 
X 10 20 30 40 50 60 70 


70 80 90 100 110 120 130 

GYFRFKKFCSSTHNAECEC I EGFHCLGPQCTRC — EKDCRPGQELTKQGC — KTCSLGTFNDQNGTGVCRPW 


GDF SCGGRVN RC I 

80 


PQFWRCDGQVDCDNG — SDEQGCPPKTCSQDEFRCHDGKC I SR0F 

90 100 110 120 


140 150 160 170 180 190 

TNCS LDG RSVLKTGTTEKDWCGPP WSFSPSTT I SVTPEGGPGGHSLQVLTL-F 

• ill it ill ill ii i 

i ill ii ill ill ii i 

VCDSDRDCLDGSDEASCPVL TCGPASFQCNSSTC I POLWACDNDPDCEDGSDEWPQRCRGL YV 


130 

140 


150 

160 

170 

180 


200 

210 

220 

230 

240 

250 


LALTSALLLAL I F I TL — LFSVLKW I RKKFP-H I FKOPFKKTTGAA0EEDACSCRCP0EEEGGGGGYEL 

» 1 i i l ii ill i 

» • i i i ii til • 

F0GDSSPCSAFEFHCLSGEC I HSSWRCDGGPDCKDKSDEENCAVATCRPDEFQCSDGNC I HGSR0CDREYDC 
190 200 210 220 230 240 250 260 


KDMSDEV 


Probable E2 protein - Bovine papillomavirus (type 


9. ELLIS-267-3 A 

W2WLB2 


ENTRY 

TITLE 

DATE 

PLACEMENT 
SOURCE 
ACCESSION 
REFERENCE 
ttAuthors 
#Ci tat ion 
COMMENT 

COMMENT 

SUPERFAMILY 

KEYWORDS 

SUMMARY 

SEQUENCE 


W2WLB2 #Type Protein 

Probable E2 protein — Bovine papillomavirus (type 2) 
31 -Mai — 1989 ttSequence 31 -Mar- 1989 #Text 31 -Mar- 1989 
1 269. 0 7. 0 1.0 2. 0 1.0 

bov i ne pap ill omav i r us 
D31 169 

(Sequence translated from the DNA sequence) 

Groff D. E. . Mitra R. , Lancaster W. D. 
submitted to GenBank. May 1988 

The DNA sequence was obtained from GenBank. release 
57. 0. 

This virus is a member of the family Papovav i r i dae. 
#Name papillomavirus E2 protein 
early protein 

#Mo 1 ecu 1 ar-we i ght 46877 ^Length 422 ^Checksum 6025 


Initial Score = 7 Optimized Score = 49 Significance = 4.24 

Residue Identity = 23% Matches = 66 Mismatches = 169 

Gaps = 50 Conservative Substitutions = 0 

X 10 20 30 40 50 

MGN — NCYNVVV I VLLLVGCE K VGAVQNSCDNCQPGTFCRKYNPVCKSCPPSTFSS 

1 * • >1 > >11 l II , , 

• • • >i > ill l it ii 

KGARWEVEFDGNASNTNWYTVYSKLYMRTEDGWQLAKAGADGTGLYYCTMAGAGRIY— YSRFGEEAARFST 
130 X 140 150 160 170 180 190 


60 70 80 90 100 HO 1 20 

I GGQPNCN I CRVCAGYFRFKKFCSSTHNAECEC I EG— FHCLGPQCTRCEKDCRPGQ ELTKQGCKTCSL 

• l I l I III i ll ill l , 

• 1*11 Ill I || III I I 


TGH YS VRDQDR V Y AG VSSTSSDFRDRPDGVSASEGPEGDPAGKEAEPAQPVSSLLGSPACVP I R A 

200 2 1 0 220 230 240 250 


130 140 150 160 170 

GTFNDQNGTGVCRPWTNCSLDGRSVLKTGTTEKD WCGP-PV VSFSP-ST-T I SVT PEG 

! ' • iii i i i i i i iiii it 

' * ‘ iii i i i > i i iiii it 

GLGWVRDG-PRPHPYHFPAGSGGSLLRSAST PVQGPVPVDLAPRQEEEENQSPDSTEEEPVTVPRHTSD 

260 270 280 290 300 310 320 


180 190 200 2 1 0 220 230 240 

GPGGHSLQVLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTGAAQEEDACSCRCPQEEEG 

1,1 III i ill! ill i ii 

I*' ill i iiii ill i ii 

ADGFHLLK AGQSCFAL I S — GSANQVKCYRFRVKKNHRHRYENCTTTSF TVA DNGAERQGQAQ I L 

330 340 350 360 370 380 


250 X 

GGGGYEL 

l 

l 

I TFGSPGQRQDFLKHVP 
390 X 400 


10. ELL I S— 267— 3A 

JL0104 Lymphocyte-associated cell surface molecule - Huma 


ENTRY 

TITLE 

SOURCE 

ACCESSION 

REFERENCE 


JL0104 ttType Protein 

Lymphocyte-associated cell surface molecule - Human 

Homo sapiens ttCommon-name man 

JL0104 



-rrnvrfi >_■ t iwi w 


t i 


xoaauo o. ri. ? c.i I i» u i • u. 9 Utfllltsu i m. U. 


# Journal 
#T i t 1 e 


#Mo 1 ecu 1 e- 
#Res i dues 
^Comment 


GENETIC 

#Map-pos i t i 
KEYWORDS 
FEATURE 
1-51 

52-385 

52-345 
346-368 
369-385 
73, 1 17, 190, 
284 , 324 

377 , 380 


, A ^ 1 i£s 6 vv^iil.bgBiy che c ' K 

J. Exp. T^d. ( 1989 7 170 s 123-133 

Isolation and chromosomal localization of cDNAs 
encoding a novel human lymphocyte cell surface 
molecule, LAM-1. Homology with the mouse 
lymphocyte homing receptor and other human 
adhesion proteins. 

-type mRNA 

1 -385 < TED > 

The sequence shown here is composed of multi 

homologous domains. One domain is homologous with 
animal lectins, one is homologous with epidermal 
growth factor, and two short consensus repeat 
units similar to those found in C3/C4 binding 
proteins. 

; i on 1 q22— 25 

membrane prate in\ glycoprotein\ adhesion protein 

#Doma in si gna 1 sequence < pred i cted ) 

< SIG >\ 

ttProtein lymphocyte-associated cell 
surface molecule (predicted) <MAT>\ 
ttDomain extracellular (probable) <EXT)\ 
#Domain transmembrane (probable) <TMM>\ 
^Domain cytoplasmic tail <CYT>\ 


245 , 259 , 

i nd i ng-s i te carbohydrate (Asn) 

( potential >\ 

#Mod i f i ed-s i te phosphorylation (Ser) 

( probab 1 e ) 

#Mo 1 ecu 1 ar— we i ght 43743 ttLength 385 ^Checksum 4445 


Initial 

Residue 

Gaps 

Score = 

Identity = 

9 

22% 

50 

Optimized Score = 49 

Matches = 66 

Conservative Substitutions 

S i gn i f i cance = 
Mismatches = 

4. 24 
174 
0 


X 

10 

20 30 

40 50 



MGNNCYNWV I VLLLVGCEK VG AV0NSCDNC0PGTFCRK YNPVCKSCPPSTFSS I GG0P 

• • •> till i 

• ' • • i i i i l 

AE I EYLEKTLPFSRSYYW I GI RK I GG I WTWVGTNKSLTEEAENWGDGEPNNKKNKEDCVE I Y I KRNKDAGKW 
90 X 1 00 110 120 130 140 150 

GO 70 80 90 lOO llO 

N CN I CRVCAGYFRFKKFCSSTHNAEC— EC I EGFHC LGPQC TRCEKDCRPG0ELTK0GCKT 

1 • • » l l i l l till it III ll 

* • 1 * fill < l l t l ll l t i t l 

NDDACHKLK A ALC YTASCQPWSCSGHGEC VE 1 1 NN YTCNCD VG Y YGPQCQFV I QCEPLE AP — ELGTMDC— T 
160 170 180 190 200 210 220 

120 130 140 150 160 170 180 

CSLGTFN— DQNGTGVCRPWTNCSLDGRSVLKTGTTEKD WCGPP WSFSPSTT I SV TPEGGPGGH 

1 * • • * I I i i It it i i i l it 

’ 1 • * i itit it it iiii it 

HPLGNFNFNSQCAFSCSEGTN — LTG I EETT CEPFGNWSSPEPTCOV I QCEPLSAPDLG I MNC 

230 240 250 260 270 280 

190 200 210 220 230 240 

SLOVLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FK0PFKKTTGAAOEEDACSCRCP 

''«««» l* ti i it ll i 

l l l l i l t i it i it it | 

S— HPLASF— SFTSACTF I CSEGTEL I GKKKT I CESSG I WSNPSP I C0KLDKSFSM I KEGDYNPLF I P VAVMV 
290 300 310 320 330 340 350 

250 X 

0EEEGGGGGYEL 


360 


370 


Results -file el 1 is— 267— 3a— spt:! res made by wendy c on Mon 27 Aug 90 16*08:31-PDT. 


Query sequence being compared s 
Number o-f sequences searched! 
Number o-f scores above cutoffs 


ELLIS-267-3A 

15409 

4274 


Results o-f the initial comparison of ELLIS-267-3A with: 
Data bank s Swiss-Prot 14, all entries 


10000 - 


N 

U 5000- 

M - * * 

B 

E - * * 

R 

- * 

O 

F 1000- 

- * 

S - * 

E 500- 

Q — * 

U - * 

E 

N - * 



E 

S 100- 

— * 

50- * 


10 - 


* 


* 


* 


* 


SCORE o: 
STDEV -2 


2 

-1 


4 

0 


1 1 


13 


IS 

S 


: 17 
7 


19 


Similarity matrix 
Mismatch penalty 
Gap penalty 
Gap size penalty 
Cutoff score 
Randomization group 


PARAMETERS 


Un i tary 
1 

1 . 00 
0. 05 
5 
0 


Initial scores to save 20 

Optimized scores to save 20 


K— tupl e 

Joining penalty 
Window size 


Alignments to save 
Display context 


Scores = 


SEARCH STATISTICS 


2 

20 

32 


lO 

lO 


Mean 


Median Standard Deviation 

7 1. 78 


T i mes « 


CPU 

00 = 02 = 39. 98 


Total Elapsed 
00 = 08 = 07. OO 


Number of residues: 4914263 

Number of sequences searched: 15409 

Number of scores above cutoffs 4274 

Cut-off raised to 6. 

Cut-off raised to 7. 

Cut-off raised to 8. 

The scores below are sorted by initial score. 
Significance is calculated based on initial score. 


A 100 % identical sequence to the query sequence was not found. 


The list of best scores is= 


Init. Opt. 

Length Score Score Sig. Frame 


Sequence Name 


Description 




**** 7 standard deviations above mean 

**** 




1 . 

HM02SHUMAN 

OCT AI#9^1^YWteW^ c, TWANSCR I PT I ON 

478 

19 

31 

7. 86 

0 



**** 6 standard deviations above mean 

**** 




2. 

C 1 SSHUMAN 

COMPLEMENT COMPONENT CIS PRECU 

688 

17 

43 

6. 73 

0 

3. 

ECHMSRAT 

ENOYL-COA HYDRATASE, MITOCHOND 

290 

16 

39 

6. 17 

0 

4. 

MTDMSMOUSE 

DNA ( CYTOS I NE-5 > — METHYLTRANSFE 

1573 

16 

26 

6. 17 

0 

5. 

TEN ASCH I CK 

TENASCIN (FRAGMENT). 

697 

16 

41 

6. 17 

0 



**** 5 standard deviations above mean 

**** 




6 . 

COA 1 SBFDV 

COAT PROTEIN VP1. 

343 

15 

30 

5. 61 

0 

7. 

CADPSMOUSE 

PLACENTAL-CADHERIN PRECURSOR < 

822 

15 

42 

5. 61 

0 

8. 

KC2ASDR0ME 

CASEIN KINASE II, ALPHA CHAIN 

335 

15 

26 

5. 61 

0 

9. 

NIFASAZOVI 

NIF-SPECIFIC REGULATORY PROTEI 

522 

15 

38 

5. 61 

0 

10 . 

ATPXSANASP 

ATP SYNTHASE B’ CHAIN (EC 3.6. 

163 

15 

22 

5. 61 

0 

1 1 . 

VE6SHPV16 

EG PROTEIN. 

158 

15 

28 

5. 61 

0 

12 . 

DHASSECOLI 

ASPARTATE-SEMI ALDEHYDE DEHYDRO 

367 

14 

25 

5. 05 

0 

13. 

COX 1 SSCHPO 

CYTOCHROME C OXIDASE POLYPEPTI 

537 

14 

43 

5. 05 

0 

14. 

ENVSBLV 

ENV POLYPROTEIN ( CONTAINS » COA 

515 

14 

44 

5. 05 

0 

15. 

LYAGSHUMAN 

LYSOSOMAL ALPHA-GLUCOS I D ASE PR 

951 

14 

41 

5. 05 

0 

1G. 

AST3SDR0ME 

ACHAETE-SCUTE COMPLEX PROTEIN 

257 

14 

37 

5. 05 

0 

17. 

SK I SHUMAN 

SKI ONCOGENE (GENE NAME » SKI). 

728 

14 

40 

5. 05 

0 

18. 

PRTZSBOVIN 

PROTEIN Z. 

396 

14 

34 

5. 05 

0 

19. 

RRPLSVSVS J 

RNA POLYMERASE BETA SUBUNIT (E 

2109 

14 

43 

5. 05 

0 

20. 

MYSGSCHICK 

MYOSIN HEAVY CHAIN, GIZZARD SM 

1978 

14 

35 

5. 05 

0 


The scores below are sorted by optimized score. 

S i gn i f i cance is calculated based on optimized score. 

A 100% identical sequence to the query sequence was not found. 


The list of best scores i s * 

Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 


**** 5 standard deviations above mean **#* 


1. 

EGFSMOUSE 

EPIDERMAL GROWTH FACTOR PRECUR 

1217 

9 

51 

5. 04 

0 

2. 

VGLMSPTPV 

M POLYPROTEIN PRECURSOR ( CONTA 

1313 

9 

51 

5. 04 

0 



***# 4 standard deviations above mean 

**## 




3. 

GMP1 SHUMAN 

GRANULE MEMBRANE PROTEIN 140 P 

830 

7 

50 

4. 58 

0 

4. 

ALCSRAB I T 

IG ALPHA CHAIN C REGION ( FRAGM 

299 

9 

50 

4. 58 

0 

5. 

KALSHUMAN 

PLASMA KALLIKREIN PRECURSOR (E 

638 

8 

50 

4. 58 

0 

6. 

OX40SRAT 

0X40 ANTIGEN PRECURSOR. 

271 

12 

50 

4. 58 

0 

7. 

CA3GSCH I CK 

COLLAGEN ALPHA 3(VI) (GENE NAM 

2914 

7 

49 

4. 12 

0 

8. 

LDLRSHUMAN 

LOW-DENSITY LIPOPROTEIN ( LDL) 

860 

7 

49 

4. 12 

0 

9. 

RINISPIG 

RIBONUCLEASE INHIBITOR. 

456 

9 

49 

4. 12 

0 

10. 

LAM 1 SHUMAN 

LEUKOCYTE ADHESION MOLECULE- 1 

372 

9 

49 

4. 12 

0 

1 1. 

NGFRSHUMAN 

NERVE GROWTH FACTOR RECEPTOR P 

427 

8 

49 

4. 12 

0 

12. 

ATPBSIPOBA 

ATP SYNTHASE BETA CHAIN (EC 3. 

489 

10 

49 

4. 12 

0 

13. 

SUWASDROME 

SUPPRESSOR-OF-WHITE-APRICOT PR 

964 

7 

49 

4. 12 

0 

14. 

ACDSSHUMAN 

ACYL-COA DEHYDROGENASE PRECURS 

412 

13 

49 

4. 12 

0 

15. 

CAMLSMOUSE 

NEURAL CELL ADHESION MOLECULE 

1260 

7 

49 

4. 12 

0 

16. 

POLSSS I NDV 

STRUCTURAL POLYPROTEIN (CONTA I 

1245 

13 

49 

4. 12 

0 

17. 

LNHRSHUMAN 

LYMPH NODE HOMING RECEPTOR PRE 

372 

7 

49 

4. 12 

0 

18. 

NEUSRAT 

NEU ONCOGENE PRECURSOR (EC 2. 7 

1260 

9 

49 

4. 12 

0 

19. 

CHITSPHAVU 

ENDOCHITINASE PRECURSOR (EC 3. 

328 

7 

49 

4. 12 

0 



**** 3 standard deviations above mean 

**#* 




20. 

HEMASSENDH 

HEMAGGLUT INI N-NEURAM INI DASE ( E 

576 

7 

48 

3. 66 

0 


1. ELLIS-267-3A 

EGFSMOUSE EPIDERMAL GROWTH FACTOR PRECURSOR < EGF ) . 


ID 

AC 


EGFSMOUSE 
POl 132? 


STANDARD ; 


prt; 


1217 AA. 



A WWW 




DT 21— JUL— 1986 < REL. OJ , . .LAST, SEQUENCE UPDATE) 

DT 01 -JAN-1990 ( REE. Sl l^, LAS l f^iNOT AT I ON UPDATE) 

DE EPIDERMAL GROWTH FACTOR PRECURSOR (EGF). 

OS MOUSE (MUS MUSCULUS). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 

oc eutheria; rodent i a. 

RN C 1 ] < SEQUENCE FROM N. A. ) 

RA SCOTT J. , URDEA M. , QUIROGA M. , SANCHEZ-PESCADOR R. , FONG N. M. , 

RA SELBY M. , RUTTER W. J. , BELL G. I. ; 

RL SCIENCE 221 : 236-240 ( 1983). 

RN C23 (SEQUENCE FROM N. A. > 

RA GRAY A. , DULL T. J. , ULLRICH A. ; 

RL NATURE 303 » 722-725 < 1 983 ) . 

RN C3I (SEQUENCE OF 977-1029) 

RA SAVAGE C. R. JR. , INAGAMI T. , COHEN S. ; 

RL J. BIOL. CHEM. 247 : 76 1 2-762 1 ( 1 972 ) . 

RN [43 (DISULFIDE BONDS) 

RA SAVAGE C. R. JR. , HASH J. H. , COHEN S. ; 

RL J. B I OL. CHEM. 248 : 7669-7672 ( 1 973 ) . 

CC — ! — FUNCTION ‘ THE GROWTH FACTOR STIMULATES THE GROWTH OF VARIOUS 
CC EPIDERMAL AND EPITHELIAL TISSUES IN VIVO AND IN VITRO AND OF SOME 

CC FIBROBLASTS IN CELL CULTURE. 

CC — ! — THE CLEAVAGE SITE FOR THE SIGNAL SEQUENCE IS NOT KNOWN. 

CC -!- THE PRECURSOR SEQUENCE CONTAINS SEVEN REGIONS THAT ARE SIMILAR 
CC TO THE EPIDERMAL GROWTH FACTOR SEQUENCE « RESIDUES 357-399, 

CC 400-440, 441-480, 745-784, 832-885, 886-925, AND 926-976. 

CC — ! — CAUTION: REF. 2 SEQUENCE DIFFERS GREATLY FROM RESIDUES 1134-1168 
CC OF THAT SHOWN DUE TO AN INSERTION OF 1 BASE IN THE N. A. SEQUENCE 

CC WITH RESPECT TO THAT OF SCOTT, ET AL. , WHICH CAUSES A SHIFT IN THE 

CC READING FRAME. 

DR PIR; AO 1387; EGMSMG. 

DR EMBLJ V00741; MMEGF1. 


DR 

prosite; 

PS00022 

; EGF. 


KW 

EGF— LIKE 

domain; 

GROWTH FACTOR; TRANSMEMBRANE; SIGNAL. 

FT 

SIGNAL 

1 

? 


FT 

CHAIN 

? 

1217 

EPIDERMAL GROWTH FACTOR. 

FT 

REPEAT 

321 

360 


FT 

REPEAT 

361 

401 


FT 

REPEAT 

402 

442 


FT 

REPEAT 

443 

482 


FT 

REPEAT 

746 

786 


FT 

REPEAT 

837 

875 


FT 

REPEAT 

876 

917 


FT 

REPEAT 

918 

958 


FT 

REPEAT 

978 

1018 


FT 

PEPTIDE 

977 

1029 

EPIDERMAL GROWTH FACTOR. 

FT 

DISULFID 

982 

996 


FT 

DISULFID 

990 

1007 


FT 

DISULFID 

1009 

1018 


FT 

DOMAIN 

1024 

1029 

NOT REQUIRED FOR FULL BIOLOGICAL ACTIVITY 

FT 




IN VIVO. 

FT 

CONFLICT 

790 

790 

D — > Y (IN REF. 2). 

FT 

CONFLICT 

1048 

1048 

A — > S (IN REF. 2). 

SQ 

SEQUENCE 

1217 

AA; 133143 

MW; 7471189 cn; 


Initial Score = 
Residue Identity = 
Gaps = 


9 Optimized Score = 51 Significance ■= 5.04 

23% Matches = 69 Mismatches = 162 

60 Conservative Substitutions => O 


X 10 20 30 40 50 60 

MGNNCYNVVV I VLLLVGCEK VGAVQNSCDNCQPGTFCRK YNPVCKSCPPSTFSS I GGQPNCN 

2 1 » >i iii iii • i 

SCFD I DECQRGAHNCAENAACTNTEGGYNCTCAGRPS SPGRSC PDST APSLLGEDGHHLDRN 

920 X 930 940 950 960 970 


PGQELTKQGCKTC 


I CRVC AGYFRFKKFC SSTHNAECEC I EGFHCLGPQCTRCEKDCR 

• > > Best Available Copy > ..... 

1 • » i i i 1 r i i i i i i it , 

SYPGCPSSYDG YCLNGGVCMH I ESLDSYTCNCV I GYSGDRCQ-TR DLRWWELRHAGYGQKHD I MVVAVC 

980 990 1 000 1 0 1 0 1 020 1 030 1 040 


120 130 140 150 160 170 

SLGTFNDQNGTGVCR PWTN— CSLDGRSVLKTGTTEKD VVCGPPVVSFSPSTT I SVTPEGGP 


MVALVLLLLLGMWGTYYYRTRKQLSNPPKNPCDEPSGSVSSSG PDSSSGAAVASCPQPWFWLEKHQOP 

1 050 1 060 1 070 1 080 1 090 1100 1110 


180 190 200 2 1 0 220 230 240 

GGHSLQVLTLFLALTSALLLAL I F I TLLFSV-LKW I RKKFPH I FK0PFKKTTGAA0EEDACSCRCP0EEEG- 

11 * 1 » till i i i t i il i i | | | | | 

11 1 1 1 i < i i i i i i i ii i i i i i i i 

KNGSLPADGTNGAWDA GLSPSL0LGSVHLTSWR0K— PH I DGMGTG0SCW I PPSSDRGP0E I EGN 

1120 1130 1140 1150 1160 1170 

250 X 

GGGGYEL 

i i 

SHLPS YRP VGPEK LHSL0S ANGS 
1180 1190 1200 


2. ELLIS-267-3 A 

VGLMSPTPV M POLYPROTEIN PRECURSOR ( CONTAINS 8 NONSTRUCTURAL P 


ID VGLM93PTPV STANDARD; PRT ? 1313 AA. 


AC 

P03517; 





DT 

21— JUL— 1986 

(REL. 

01 , 

CREATED) 

DT 

2 1 —JUL— 1 986 

(REL. 

01 , 

LAST 

SEQUENCE UPDATE) 

DT 

0 1 -OCT- 1 989 

(REL. 

12, 

LAST 

ANNOTATION UPDATE) 


DE M POLYPROTEIN PRECURSOR (CONTAINS: NONSTRUCTURAL PROTEIN NS-M? 
DE GLYCOPROTEINS G1 AND G2). 

OS PUNT A TORO PHLEBOVIRUS. 

oc viridae; ss-rna enveloped. viruses; bunyaviridae. 

RN [ 1 ] ( SEQUENCE FROM N. A. ) 

RA IHARA T. , SMITH J. , DALRYMPLE J. M. , BISHOP D. H. L. ? 

RL VIROLOGY 144:246-259(1985). 

CC — ! — SPECIFIC ENZYMATIC CLEAVAGES IN VIVO YIELD MATURE PROTEINS 
CC INCLUDING NONSTRUCTURAL PROTEIN NS-M, GLYCOPROTEIN G1 , AND 

CC GLYCOPROTEIN G2. 

DR EMBL ? Ml 1156? PTPMRNA. 

DR PIR 5 A04109? VGVUPT. 

KW POLYPROTEIN? GLYCOPROTEIN; TRANSMEMBRANE; NONSTRUCTURAL protein. 


FT 

CHAIN 

1 

270 

NONSTRUCTURAL PROTEIN NS-M. 


FT 

CHAIN 

271 

809 

GLYCOPROTEIN Gl. 



FT 

CHAIN 

810 

1313 

GLYCOPROTEIN G2. 



FT 

CARBOHYD 

76 

76 

POTENTIAL. 



FT 

CARBOHYD 

102 

102 

POTENTIAL. 



FT 

CARBOHYD 

496 

496 

POTENTIAL. 



FT 

CARBOHYD 

1 154 

1 154 

POTENTIAL. 



FT 

CARBOHYD 

1243 

1243 

POTENTIAL. 



SB 

SEQUENCE 

1313 

AA ; 1 46374 

MW? 9199811 cn; 



Initial Score 

= 

9 Optimized Score = 51 

S i gn i i i cance = 

5. 04 

Residue Identity 

= 

23% Matches 

- 68 

Mismatches = 

172 

Gaps 


= 

53 Conservative Substitutions 

= 

0 


x lO 20 30 40 50 

MGNNC YNVVV I VLLLVGCEK VGAV0NSCDNC0PG — TFCRKY NPVCKSCPPSTFS 


TNVSFVCYEHVG0DEOEVEHRALKRVSVNDCK I VDNSK0K I CTGDHVFCEK YDCSTSYPDVTC I HAPGSGFL 
500 X 510 520 530 540 550 560 




A W 


l XKS 


S I GGQPNCN I c RVCAG Y -----^RF | Kg^^STHNAEC EC- 1 EGFHCLGPQCTRCEKDCRPGQELT — K 

• i t t i ii i i i i t i lit 

Y I — NLMGSW I KPQCVGYERVLVDREVKQPLLAPEQNCDTCVSECLDEGVH I KSTGFE I TSA 

570 580 590 GOO 610 620 

120 130 140 150 1G0 170 180 

QGCKTCSLGTFNDQNGT — GVCRPWTNCSLDGRSVLKTGTTEKDWCGPPWSFSP — STT I SVTPEGGPGG 

* ' ill ill ii i i i i , 

• 1 ill ill ii iiii • 

V ACSHGSC I SAHQEPSTS V I VP YPGLL ASVGGR I G I HLSHT-SDSASVHM V VVCPPRDSCA AHNCLLC YHG I 
630 640 650 660 670 680 690 

190 200 210 220 230 240 

HSLQ— VLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTG AA0EEDACSC 

• << iiiii ii ill i till l 1 

• • • liill «i III l till i l 

LNYQCHSTLSA I LTSFLL — I LF I YTVFSVTTN I LYVLRL I PKQ— LKSPVGWLKLF I NWLLT ALR I KTRNVM 
700 710 720 730 740 750 760 

250 X 

RCPQEEEGGGGGYEL 

i i 

i i 

RR I NOR I GW VDHHDVERPRHREPMR 
770 780 790 


3. ELLIS-267-3 A 

GMP 1 $HUMAN GRANULE MEMBRANE PROTEIN 140 PRECURSOR. 

ID GMP INHUMAN STANDARD; PRT ; 830 A A. 

AC P16109; 

DT 01 -APR- 1990 ( REL. 14, CREATED) 

DT 01 -APR- 1990 (REL. 14, LAST SEQUENCE UPDATE) 

DT 01 -APR- 1990 (REL. 14, LAST ANNOTATION UPDATE) 

DE GRANULE MEMBRANE PROTEIN 140 PRECURSOR. 

OS HUMAN (HOMO SAPIENS). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 

oc eutheria; primates. 

RN C 1 ] ( SEQUENCE FROM N. A. ) 

RA JOHNSTON G. I. , COOK R. G. , MCEVER R. P. ; 

RL CELL 56 » 1 033- 1 044 ( 1 989 ) . 

DR PIR; A30359; A30359. 

DR EMBL; M25322; M25322. 

DR PROSITE; PS00022J EGF. 

kw lectin; glycoprotein; transmembrane; signal. 


FT 

SIGNAL 

1 

41 


FT 

CHAIN 

42 

830 

GRANULE MEMBRANE PROTEIN 140. 

FT 

DOMAIN 

42 

159 

LECTIN. 

FT 

DOMAIN 

160 

199 

EGF-LIKE. 

FT 

DOMAIN 

200 

770 

COMPLEMENT H/C4B-BINDING. 

FT 

TRANSMEM 

772 

795 

PUTATIVE. 

FT 

CARBOHYD 

54 

54 

PUTATIVE. 

FT 

CARBOHYD 

98 

98 

PUTATIVE. 

FT 

CARBOHYD 

180 

180 

PUTATIVE. 

FT 

CARBOHYD 

212 

212 

PUTATIVE. 

FT 

CARBOHYD 

219 

219 

PUTATIVE. 

FT 

CARBOHYD 

41 1 

41 1 

PUTATIVE. 

FT 

CARBOHYD 

460 

460 

PUTATIVE. 

FT 

CARBOHYD 

518 

518 

PUTATIVE. 

FT 

CARBOHYD 

665 

665 

PUTATIVE. 

FT 

CARBOHYD 

716 

716 

PUTATIVE. 

FT 

CARBOHYD 

723 

723 

PUTATIVE. 

FT 

CARBOHYD 

741 

741 

PUTATIVE. 

SQ 

SEQUENCE 

830 aa; 

90766 

MW ; 351 0536 CN ; 


7 Opt i m i zed Score = 50 S i gn i -f i cance = 

22% Matches = 66 Mismatches = 


Initial Score 
Residue Identity 


4. 58 
174 


saps = 47 conservative Substitutions = O 

Best Available Copy 

X 10 20 30 40 50 60 

MGNNCYNVVV I VLLLVGCEK VGAVQNSCDNCQPGTFCRK YNPVCKSCPPSTFSS I GGQPNCN 

1,1 iiiiii iti ii 

1 • * i i i i » i ill ii 

NEARVNCSHPFGAFRYQSVCSFTCNEGLLLVGA SVLQCLATGNWNSVPPECGA I PCTPLLS — PQNGTM 

460 X 470 480 490 500 510 520 

70 80 90 lOO 1 lO 120 

I CRVCAGYFRFKKFCSSTHNAECEC I EGFHCLGP QCTR — CEKDCRPGQELTK QGCKTCS- 

1 1 11 III il ill i • i i « l ii 

« • * « ill ii ill i i i i ii ,i 

TCVQPLGSSSYKS I C QF I CDEGYSLSGPERLDCTRSGRWTDSPPMCEA I KCPELFAPEQGSLDCSD 

530 540 550 560 570 580 

130 140 150 160 170 180 

— LGTFN DQNGTGVCRP— WTNCSLDGRSVLKTGTTEKDWCGPPWSFSPSTT I SVTPEGGPGGH 

* j 1 • • i 1 «i ill i •• til 

TRGEFNVGSTCHFSCNNGFKLEGPNNVECTTSGR— WSATPPTCKG I ASLPTPGLBCPALT TPGQGTMYC 

590 600 6 1 0 620 630 640 650 


1 90 200 2 1 0 220 230 240 

SLQVLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKOP FKKTTGAAQEEDA — CSCRCPQEEE 

• II it i i i ill 

••• *i i i i iii 

RHHPGT— FGFNTTCYFGCNAGFTL I GDSTLSCRPSGQWTAVTPACRAVKCSELHVNKP I AMNCSNLWGNFSY 
660 670 680 690 700 710 720 


250 X 

G-GGGGYEL 

i I i 

t ii 

GS I CSGHCLEGQLLNGS AG 
730 X 740 


4. ELLIS-267-3 A 

ALCSSRABIT IG ALPHA CHAIN C REGION (FRAGMENT). 

ID ALCSRABIT STANDARD; PRT ; 299 AA. 

AC P01879; 

DT 2 1 - JUL- 1 986 < REL. 0 1 , CREATED ) 

DT 21 -JUL- 1986 (REL. 01, LAST SEQUENCE UPDATE) 

DT 01 -NOV- 1988 (REL. 09, LAST ANNOTATION UPDATE) 

DE IG ALPHA CHAIN C REGION (FRAGMENT). 

OS RABBIT (ORYCTOLAGUS CUNICULUS). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 

□c eutheria; lagomorpha. 

RN [ 1 ] < SEQUENCE FROM N. A. > 

RA KNIGHT K. L. , MARTENS C. L. , STOKLOSA C. M. , SCHNE I DERMAN R. D. ; 

RL NUCLEIC ACIDS RES. 12*1657-1670(1984). 

CC -!- THIS IMMUNOGLOBULIN BELONGS TO THE IGA-G SUBCLASS. IT WAS ISOLATED 
CC FROM A RABBIT HOMOZYGOUS FOR A2, N80, DE12.15, F71 , G75 HEAVY 

CC CHAIN HAPLOTYPE. 

DR PIR; A02174; AHRB. 

DR EMBL; XOQ353; OC I G02. 

DR PROSITE; PS00290; I G_MHC. 

KW IMMUNOGLOBULIN C REGION. 

FT NON_TER 1 1 

SG SEQUENCE 299 AA; 32256 MW; 500462 CN; 

Initial Score = 9 Optimized Score = 50 Significance = 4.58 

Residue Identity = 23% Matches = 69 Mismatches = 153 

Saps = 70 Conservative Substitutions = O 

X 1 0 20 30 40 50 60 

MGNNCYNVVV I VLLLVGCEKVGAVGNSCDNCGPGTFCRK YNPVCKSCPPSTFSS I GGGPNC N I CRVCA 

S j j i'll iii ii i i i i i 

QSGTSGPYT ACSEL I LPVTGCLG — QKS-AAC HVEYNSV I NESLPVPF— PDCCPANSCCTC- 








W>U 


70 80 BegyVvailable n0 120 130 

GYFRFKKFCSSTHNAECEC I EGFHCLGPQCTRCEKDCRPGQELTKQGCK — TCSLGTFNDSNGTGVCRPWTN 


-PSSSSRNL I SGCQPSLSLQRPDLGDLLLGRDASLT CTLSGLKNPEDAVFTWEPTNGNEPVQQRAQ 
GO 70 80 90 100 110 120 


140 150 160 170 180 190 

CSLDG RSVL KTGTTEKD VVCGPP VVSFSPSTT I S VTPEGGPGGHSLQVLTLFL ALTSA 

1 ' ill ii it it lit ill ill ill 

• 1 ill ii ii it ill ill iii ill 

RDLSGCYSVSSVLPSSAETWK ARTEFTCTVTHPE I DSGSLT AT I SRGVVTP PQVHLLPPPSEEL ALNEQ 

130 140 150 160 170 180 190 


200 210 220 230 240 250 

LLLAL I F I TLL FS VLKW I R KKFPH I FKQP FKKTTGAAQEEDACSCRCPQEEEGGG 

1 i i i i t i i i i ill III t 

i • i i i i i i i i ill ill i 

VTL TCLVRGFSPKDVLVSWRHGGQEVPEDSFLVWKSMPESSQDKATYA I TSLLRVPAEDWNQG 

200 210 220 230 240 250 


X 

GGYEL 

t 

i 

DTYSCMVGHEGLAEH 

260 


5. ELLIS-267-3 A 

KALSHUMAN PLASMA KALLIKREIN PRECURSOR (EC 3.4.21.34) (PLASMA 

ID KALSHUMAN STANDARD; PRT » 638 AA. 

AC P03952 ; 

DT 23— OCT— 1 986 ( REL. 02, CREATED) 

DT 23— OCT— 1 986 (REL. 02, LAST SEQUENCE UPDATE) 

DT 01 -APR- 1990 (REL. 14, LAST ANNOTATION UPDATE) 

DE PLASMA KALLIKREIN PRECURSOR (EC 3.4.21.34) (PLASMA PREK ALL I KRE I N ) 

DE (KININOGENIN). 

OS HUMAN (HOMO SAPIENS). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 

oc eutheria; primates. 

RN [ 1 ] ( SEQUENCE FROM N. A. ) 

RA CHUNG D. W. , FUJIKAWA K. , MCMULLEN B. A. , DAVIE E. W. ; 

RL BIOCHEMISTRY 25:2410-2417(1986). 

CC — ! — FUNCTION: THE ENZYME CLEAVES LYS-ARG AND ARG-SER BONDS. IT 
CC ACTIVATES, IN A RECIPROCAL REACTION, FACTOR XII AFTER ITS BINDING 

CC TO A NEGATIVELY CHARGED SURFACE. IT ALSO RELEASES BRADYKININ FROM 

CC HMW KININOGEN AND MAY ALSO PLAY A ROLE IN THE RENIN-ANGIOTENSIN 

CC SYSTEM BY CONVERTING PRORENIN INTO RENIN. 

CC — ! — SUBUNIT: THE ZYMOGEN IS ACTIVATED BY FACTOR XI IA, WHICH CLEAVES 
CC THE MOLECULE INTO A LIGHT CHAIN, WHICH CONTAINS THE ACTIVE SITE, 

CC AND A HEAVY CHAIN, WHICH ASSOCIATES WITH HMW KININOGEN. THESE 

CC CHAINS ARE LINKED BY ONE OR MORE DISULFIDE BONDS. 

DR EMBL; Ml 3 143; HSPPKKA. 

DR PIR; A00921; KQHUP. 

DR PROSITE; PS00134; TR YPS I N_H I S. 

DR PROSITE; PS00135J TRYPS I N_SER. 

KW HYDROLASE; SERINE PROTEASE; GLYCOPROTEIN; PLASMA? ZYMOGEN? SIGNAL? 

KW FIBRINOLYSIS; BLOOD COAGULATION? INFLAMMATION; LIVER? DUPLICATION? 

KW BRADYKININ. 


FT 

SIGNAL 

1 

19 




FT 

CHAIN 

20 

390 

PLASMA 

KALLIKREIN, HEAVY 

CHAIN. 

FT 

CHAIN 

391 

638 

PLASMA 

KALLIKREIN, LIGHT 

CHAIN. 

FT 

DOMAIN 

389 

621 

SERINE 

PROTEASE. 


FT 

REPEAT 

20 

104 




FT 

REPEAT 

1 10 

194 




FT 

REPEAT 

200 

284 





FT 

CARBOHYD 

127 

127 




FT 

CARBOHYD 

308 

Best^vgiilable Copy 



FT 

CARBOHYD 

396 

396 




FT 

CARBOHYD 

453 

453 




FT 

CARBOHYD 

494 

494 




FT 

ACT_S I TE 

434 

434 

CHARGE 

RELAY 

SYSTEM. 

FT 

ACT_S I TE 

483 

483 

CHARGE 

RELAY 

SYSTEM. 

FT 

ACT_S I TE 

578 

578 

CHARGE 

RELAY 

SYSTEM. 

SQ 

SEQUENCE 

638 AA; 71369 

MW; 2175970 CN ; 



Initial Score = 
Residue Identity = 
Gaps = 


8 Optimized Score = 50 Significance 

23% Matches = 70 Mismatches 

69 Conservative Substitutions 


4. 58 
164 
0 


X 10 20 30 40 50 

MGNNC YNVVV I VLLLVGCEK VGAVQNSCDNCQPGTFCRK YNPVCKSCPPST 


DAFVCRT I CTYHPNCLFFTFYTNVWK I ESQRNVCLLKTSE— SGTPSSS — TP0ENT I SGYSLLTCK RTLPEP 
230 X 240 250 260 270 280 290 


60 70 80 90 lOO 110 

FSS I GGGPNCN I CRVCAGYFRFKK F CSSTHN AECEC I EGFHCLGPQCTRCEKDCRPGQELTKQGCK 


CHSK I YPGVDFGGEELNV- 
300 


TF VKGVNVCQET CTKM I RCQFFTYSLLPEDCKEEK- 

310 320 330 340 


•CKCFLRLSMDGSP 

350 


T- 


120 130 140 150 160 170 

■-CSLGTFNDQNGTG VCRPWTNCSLDGRSVLKTGTTEKD WCGPP WSFSPSTT I SVTPEGGP 


TR I AYGTQGSSGYSLRLCNTGDNSVCTTKT- 
360 370 380 


-STR I V GGTNSSWGEWPW0 VSLQ VKLTAQRHLCGGS 

390 400 410 420 


180 190 200 210 220 230 

GGHSLQVLT LFL ALTSALL-LAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTG AAQE 



L I GH9WVLT AAHCFDGLPLQDVWR I YSG I LNLSD I TKDTPFSQ I KE I I I HQNYK VSEGNHD I AL I K 

430 440 450 460 470 480 


240 250 X 

EDA CSCRCPQEEEGGGGGYEL 

i i i 

i t i 

LQAPLNYTEFQKP I CLPSKGDTST I YTNCWVTGWG 
490 500 510 X 520 


6. ELL I S— 267— 3A 

0X403SRAT 0X40 ANTIGEN PRECURSOR. 


ID 

AC 

DT 

DT 

DT 

DE 

□S 

□C 

□C 

RN 

RA 

RL 

CC 

DR 

KW 

FT 

FT 


0X4093RAT 

PRELIMINARY; 

prt; 

271 AA. 

PI 5725; 






01 -APR- 1990 

< REL. 

14, 

CREATED) 


Ol -APR- 1990 

(REL. 

14, 

LAST 

SEQUENCE 

UPDATE) 

01 -APR- 1990 

(REL. 

14, 

LAST 

ANNOTATION UPDATE) 


0X40 ANTIGEN PRECURSOR. 

RAT (RATTUS NORVEGICUS). 

eukaryota; metazoa; chordata; vertebrata; tetrapoda; 

EUTHERIA; RODENT I a. 

Ell ( T-CELL , SEQUENCE FROM N. A. ) 

MALLETT S. , FOSSUM S. , BARCLAY A. N. i 
SUBMITTED < OCT- 1989) TO EMBL/GENBANK DATA BANKS. 

-!- SIMILARITY: TO NERVE GROWTH FACTOR RECEPTOR. 
EMBL; XI 7037; RS0X40. 

t-cell; antigen; glycoprotein; signal. 

SIGNAL 1 19 

CHAIN 20 271 0X-40 ANTIGEN. 


mammalia; 


FT 

REPEAT 

25 60 

CYSTEINE-RICH 

FT 

REPEAT 

gl Best^vfiilable 

Co R2fYSTEINE-RICH 

FT 

REPEAT 

123 164 

CYSTEINE-RICH 

S0 

SE0UENCE 

271 AAJ 29895 

MW ; 400796 CN ; 


I. 

II. 

III. 


Initial 

Score = 

12 

Residue 

Identity = 

25% 

Gaps 

= 

63 


Opt i m i zed Score = 
Matches = 

Conservative Substitut 


50 S i gn i -f i cance = 4. 58 

72 Mismatches = 145 

ions = O 


X lO 20 30 40 50 

MGNNCYNVVV I VLLLVGCE— KVGAV0NS— CD NC0PGTFCRK YN-PVCKSCPPST 


' i ill ill i it 

' i ill iti i it 


LLLGLSLGVTVKLNCVKDTYPSGHKCCREC0PGHGMVSRCDHTRDTVCHPCEPGFYNEAVNYDTCKOC 


20 X 

30 

40 

50 

60 

70 


60 

70 

80 

90 


lOO 

1 lO 


FSS I GGQPNCN I CRVCAGYFRFKKFCSSTHNAECEC I EG FHCLGPQCTRC-EKDCRPGQELTKQG 

11 1 ill ill i i i i i it i 

1 1 1 ill ill i i i i i ii i 

TQCN HRS GSELKONCTPTEDTVCQCRPGT0PRQDSSHKLGVDCVPCPPGHFSPG SN0A 

80 90 100 110 120 130 


120 130 140 

CKTCSLGTFNDONGTGVCRPWTNCSLD 

• l i i i i i i i 

• • t i i i i i i 


GRSVLKT 

it i i 
• I i i 


150 1GO 170 

— GTT — EKDV — VCGPPVVSFSPSTT I SV 

i> i i I i i i l 

ii i i i i i i t 


CK PWTNCTLSGKOIRHPASN— SLDTVCEDRSLLATLLWETORTTFRPTTVPSTTVWPRTS0LPSTPTLV 


140 

150 

160 

170 

180 

190 

200 

180 

190 

200 

210 

220 

230 

240 


TPEGGPGGHSL0VLTLFLALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKOPFKKTTGAA0EEDACSCRCP0 

• i ' * ill i i i i ii i i . ill 


APE— GPAFAV I LGLGLGLLAPLTVLLAL YLL — RKAWRSPNTPKPCWGNSFRT — P I OEEOTDTHFTLA 

2 1 0 220 230 240 250 2S0 


X 

EEEGGGGGYEL 

KI 

270 


7. ELLIS-267-3A 

CA36$CHICK COLLAGEN ALPHA 3<VI> (GENE NAME « C0L6A3) (FRAGMENT 


ID 

AC 

DT 

DT 

DT 

DE 

OS 

OC 

RN 

RA 

RL 

RN 

RA 

RL 

CC 

CC 

CC 

CC 

CC 

DR 

KW 

KW 


CA36SCHICK STANDARD; PRT ; 2914 A A. 

PI 5989; 


01 -APR- 1990 (REL. 14, CREATED) 

01 -APR- 1990 (REL. 14, LAST SE0UENCE UPDATE) 

01 -APR- 1990 (REL. 14, LAST ANNOTATION UPDATE) 

COLLAGEN ALPHA 3(VI) (GENE NAME » C0L6A3) (FRAGMENT). 

CHICKEN (GALLUS GALLUS). 

eukaryota; metazoa; chordata; vertebrata; tetrapoda; aves. 

C 1 I ( SE0UENCE FROM N. A. ) 

BONALDO P. , RUSSO V. , BUCCIOTTI F. , DOLIANA R. , COLOMBATTI A. ; 
SUBMITTED (SEP- 1989) TO EMBL/GENBANK DATA BANKS. 
t23 ( SE0UENCE OF 2648-2914 FROM N. A. ) 

BONALDO P. , COLOMBATTI A. ; 

J. BIOL. CHEM. 264: 20235-20239 < 1989). 

— ! — FUNCTION: COLLAGEN VI ACTS AS A CELL-BINDING PROTEIN. 

— ! — SUBUNIT: TRIMERS COMPOSED OF THREE DIFFERENT CHAINS: ALPHA 1(VI), 
ALPHA 2( VI ) , AND ALPHA 3(VI). 


— ! — PROLINES IN THE THIRD POSITION OF THE TRIPEPTIDE REPEATING UNIT 
(G-X-Y) ARE HYDROXYL ATED IN SOME OR ALL OF THE CHAINS. 

EMBLJ M24282; GGCOLAVI. 

EXTRACELLULAR MATRIX; CONNECTIVE TISSUE; TANDEM REPEAT; HYDROXYL AT I ON ; 

glycoprotein; cell adhesion. 


so 


SEQUENCE 


Initial Score 
Residue Identity 
Gaps 


2914 AA ; A 3 1.5,7,88 MW ; 2. 213953E+07 CN; 

Best Available Copy 

= 7 Opt ini zed Score = 49 S i gn i f i cance 

= 23% Matches = 65 Mi snatches 

= 42 Conservative Substitutions 


10 


20 


30 


40 


4. 12 
168 
0 

50 


MGN-NCYNV VV- I V-LLLVGCEK — VGAVQNSCDNCQPGTFCRK YNPVCKSCPPSTFSS 

1 > i i ii i III I i i i i I it i i i 

• i • * ii i ill i i i i i ■ ii i i i 

I I FLLDGSLNVGNANFPFVRDFVVTLVNYLDVGTDK I RVGLVQFS DTPKTEFSLYSYQTK SD I I Q 


430 

X 

440 

450 

460 

470 

480 


60 


70 

80 

90 

lOO 

1 io 

120 


I GGQPNCN I CRVCAGYFRFKKFCSSTHNAE CEC I EGFHCLGPQCTRCEKDCRPGQ ELTKQGCKTCS 

> * > i i i i i , i . i i 

ii i i i i i i i till 

RLGQLRPKGGSV— LNTGSALNFVLSNHFTEAGGSR I NEQVPQVLVLVT AGRSAVPFLQVSNDLARAGVLTFA 
490 500 510 520 530 540 550 

130 140 150 160 170 180 

LGTFN DQNGTGVCRPWTNCSLDGRSVLKTGTTEKDWCGPPWSFSPSTTI— SVTPEGGP— GGHSLQVL 


VGVRNADK AELEQ I AFNPKMVYFMDDFSDLTT - 


-LPQELKKPI TT I VSGGVEEVPLAPTESKKD 


560 


570 


580 


590 


600 


610 


620 


190 


200 


210 


220 


230 


240 


250 


TLFLALTSALLLAL I F I TLLFSVLK W I RKKFPH I FKQPFKKTTGAAQEEDACSCRCPQEEEGGGGG YEL 


ILFLIDGSANLL- 

630 

RMRLKTG 

690 


-GSFPAVRDF I HK V I SDLNVGPDATRVAVAQFSDN I Q I EFDFAELPSKQDMLLKVK 


640 


650 


660 


670 


680 


8. ELLIS-267-3A 

LDLR33HUMAN LOW-DENSITY LIPOPROTEIN ( LDL> RECEPTOR PRECURSOR. 
ID LDLR$HUMAN STANDARD; PRT ; 860 AA. 


AC 

P01 130; 





DT 

21— JUL— 1986 

(REL. 

01 , 

CREATED) 

DT 

21— JUL— 1986 

(REL. 

01 , 

LAST 

SEQUENCE UPDATE) 

DT 

1 3— AUG— 1 987 

(REL. 

05, 

LAST 

ANNOTATION UPDATE) 


DE LOW-DENSITY LIPOPROTEIN ( LDL) RECEPTOR PRECURSOR. 

OS HUMAN (HOMO SAPIENS). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 

oc eutheria; primates. 

RN [ 1 1 ( SEQUENCE FROM N. A. ) 

RA YAMAMOTO T. , DAVIS C. G. , BROWN M. S. , SCHNEIDER W. J. , CASEY M. L. , 

RA GOLDSTEIN J. L. , RUSSELL D. W. ; 

RL CELL 39:27-38(1984). 

CC -!- THIS TRANSMEMBRANE GLYCOPROTEIN BINDS LDL, THE MAJOR CHOLESTEROL- 
CC CARRYING LIPOPROTEIN OF HUMAN PLASMA, Si TRANSPORTS IT INTO CELLS 

CC BY ENDOCYTOSIS. IN ORDER TO BE INTERNALIZED, THE RECEPTOR-LIGAND 

CC COMPLEXES MUST FIRST CLUSTER INTO CLATHR I N-COATED PITS. 

CC — ! — THE AMINO END OF THE EXTRACELLULAR DOMAIN CONTAINS 7 OR 8 40- 
CC RESIDUE REPEATS. EACH REPEAT HAS ABOUT 6 CYS RESIDUES, ALL OF 

CC WHICH ARE INVOLVED IN DISULFIDE BONDS. FOLLOWING THESE REPEATS IS 

CC A REGION OF ABOUT 350 RESIDUES THAT IS HOMOLOGOUS WITH PART OF THE 

CC EPIDERMAL GROWTH FACTOR ( EGF) PRECURSOR. 

CC - ! — THE LAST HALF OF THE EXTRACELLULAR DOMAIN CONTAINS STRUCTURAL 
CC EVIDENCE OF REPETITIVE SEQUENCE. 

CC -!- AN INTRASTRAND RECOMBINATION EVENT BETWEEN TWO ALU SEQUENCES IN 
CC THE 3’ UNTRANSLATED REGION OF MRNA FROM A FAMILIAL HYPERCHOLEST- 

CC EROLEM I A PATIENT RESULTS IN THE DELETION OF THE TRANSMEMBRANE & 

CC CYTOPLASMIC DOMAINS. MOST OF THE RECEPTORS PRODUCED ARE SECRETED, 


D'-' I I nuoc mm huncRC iu me. ;=UKI-HVti L-HININU I Ul_Uo I LK ilN UUH I hU 


cc 

cc 

PITS! THEREFQ 
COMPLEXES ARE 

THEY BIND LDL> THESE Receptor-ligand 

DR 

PIR; AO 1383; 0RHULD. 


DR 

EMBL; K02573; HSLDLR. 


KW 

GLYCOPROTEIN; LDL 

; CHOLESTEROL METABOLISM; LIPID TRANSPORT? 

KW 

ENDOCYTOSIS; COATED PITS; 

TRANSMEMBRANE; RECEPTOR? SIGNAL. 

FT 

SIGNAL 

1 

21 


FT 

CHAIN 

22 

860 

LDL RECEPTOR. 

FT 

DOMAIN 

22 

788 

EXTRACELLULAR. 

FT 

TRANSMEM 

789 

810 


FT 

DOMAIN 

81 1 

860 

CYTOPLASMIC. 

FT 

REPEAT 

22 

SI 

CYSTEIN RICH. 

FT 

REPEAT 

62 

102 

CYSTEIN RICH. 

FT 

REPEAT 

103 

141 

CYSTEIN RICH. 

FT 

REPEAT 

142 

180 

CYSTEIN RICH. 

FT 

REPEAT 

191 

229 

CYSTEIN RICH. 

FT 

REPEAT 

230 

268 

CYSTEIN RICH. 

FT 

REPEAT 

269 

309 

CYSTEIN RICH. 

FT 

SIMILAR 

31 1 

661 

WITH EGF PRECURSOR. 

FT 

REPEAT 

441 

445 


FT 

REPEAT 

488 

492 


FT 

REPEAT 

531 

535 


FT 

REPEAT 

575 

579 


FT 

REPEAT 

617 

S21 


FT 

SITE 

721 

768 

CLUSTERED O-LINKED OLIGOSACCHARIDES. 

SB 

SE0UENCE 

860 AA 

? 95375 

MW ? 3807460 CN ; 


Initial Score = 
Residue Identity = 
Gaps = 


7 Opt i tti i zed Score = 49 S i gn i -f i cance 

23% Matches == 67 Mismatches 

56 Conservative Substitutions 


4. 12 
162 
0 


X 1 0 20 30 40 50 60 

MGNNCYNVW I VLLLVGCEK VGAV0NSC DNCGPGTFCRK YNPVCKSCPPSTFSS I GGQPNCN I CRVCA 


MGPWGWKLRWTVALL-LAAAGTAVGDRCERNEF0C0DG- 
X 10 20 30 


•KC I S YKWVCDGSAEC0DGSDES0ETCLSVTCKS 
40 50 60 70 


70 80 90 100 110 120 130 

GYFRFKKFCSSTHNAECEC I EGFHCLGPBCTRC — EKDCRPGBELTKQGC — KTCSLGTFND0NGTGVCRPW 


GDF SCGGRVN- 

80 


140 

TNCS — 


-RCI 


150 


-PBFWRCDG0VDCDNG — SDEQGCPPK T CSODEFRCHDGKC I SRQF 


90 


100 


1 10 


120 


160 


170 


180 190 

-SVTPEGGPGGHSLQVLTL— F 


-LDG RSVLKTGTTEKDVVCGPPVVSFSPSTT I 

« til l« ill lit 

I III It III It! 

VCDSDRDCLDGSDEASCPVL TCGPASFQCNSSTC I PQLWACDNDPDCEDGSDEWPQRCRGLYV 


130 

140 


150 

160 

170 

180 


200 

210 

220 

230 

240 

250 


LALTSALLLAL I F I TL — LFSVLKW I RKKFP-H I FKBPFKKTTGAAOEEDACSCRCPBEEEGGGGGYEL 

# lit I ii III i 

• »ii i ii ill i 

FQGDSSPCSAFEFHCLSGEC I HSSWRCDGGPDCKDKSDEENCA VATCRPDEFBCSDGNC I HGSR0CDREYDC 
190 200 210 220 230 240 250 260 


KDMSDEV 


9. ELL I S— 267— 3A 

RINI$PIG RIBONUCLEASE INHIBITOR. 

ID RINISPIG STANDARD; PRTJ 456 AA. 

AC PI 0775; 

DT Ol-JUL-1989 ( REL. 11, CREATED) 


Ul Ui-JUL-~iat53 (KtL. ill LAS I StldUtNUt UHUR I fc. / 

DT 0 1 — JAN— 1 990 < REI Bes t 1 ^aife0§^dM s|OTATION UPDATE) 

DE RIBONUCLEASE INHIBITOR. 

OS PIG <SUS SCROFA). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 
oc eutheria; artiodactyla. 

RN [13 (LIVER, SEQUENCE) 

RA HOFSTEENGE J. , KIEFFER B. , MATTHIES R. , HEMMINGS B. A. , STONE S. R. ; 
RL BIOCHEMISTRY 27=8537-8544(1988). 

CC — ! — FUNCTION: THIS PROTEIN IS AN INHIBITOR OF PANCREATIC RNASE AND 
CC ANGIOGENIN. 

CC — ! — THERE ARE 15 LEUCINE-RICH REPEATS. 

CC — ! — SIMILARITY! THE REPEATED LEUCINE-RICH SEGMENT IS FOUND IN MANY 
CC PROTEINS. 


KW 

ACETYLATION 

J TANDEM REPEAT 

; LEUCINE-REPEAT. 

FT 

MOD_RES 

1 

1 

ACETYLATION. 

FT 

REPEAT 

15 

43 

Al. 

FT 

REPEAT 

44 

71 

Bl. 

FT 

REPEAT 

72 

100 

A2. 

FT 

REPEAT 

101 

128 

B2. 

FT 

REPEAT 

129 

157 

A3. 

FT 

REPEAT 

158 

185 

B3. 

FT 

REPEAT 

186 

214 

A4. 

FT 

REPEAT 

215 

242 

B4. 

FT 

REPEAT 

243 

271 

A5. 

FT 

REPEAT 

272 

299 

B5. 

FT 

REPEAT 

300 

328 

A6. 

FT 

REPEAT 

329 

356 

B6. 

FT 

REPEAT 

357 

385 

A7. 

FT 

REPEAT 

386 

413 

B7. 

FT 

REPEAT 

414 

442 

A8. 

SQ 

SEQUENCE 

456 aa; 

49022 

MW; 991302 cn; 


Initial Score = 
Res i due I dent i ty = 
Gaps = 


9 Opt i m i zed Score = 49 S i gn i f i cance = 

23% Matches = 68 Mismatches = 

71 Conservative Substitutions = 


4. 12 
152 
0 


X 10 20 30 40 50 

MGNNCYNVVV I VLLLVGCEK VGAVQNSCDNCQPGTFCRK YNPVCKSCP — PSTFS — S I GGQ 

li ii ill i l l l l 

<t it ill i ilia 

ADSACQLETLRLENCGLTPANCKDLCG I VASQASLRELDLGSNGLGDAG I AELCPGLLSPASRLKTLWL 

1 90 200 2 1 0 220 230 240 250 


60 


70 


80 


90 


lOO 


1 10 


120 


PNCN I CRVCAGYFRFK KFCSSTHNAECEC I EGFHCLGPQCTRCEKDCRPGQELTKQGCKTCSLGT 


WECD I T ASGCRDLCRVLQAKETLKELSLAGN — KLGDEGARLL CESLLQPGCQLESLWVKSCSLTA 


260 


270 


280 


290 


300 


310 


320 


130 


140 


150 


160 


170 


FNDQNGTGVCRPWT— NCSL — DGRSVLKTGTTEKDVVCGPPVVSFSPSTT I SVTPEG 


180 

■GPGGHSLQVL 


i > ill ill i i i i i i iti 

i * (ii ill i i i i i a ill 

ACCQ HVSLMLTQNKHLLELQLSSNKLGDSG I QELC QALSQPGTTLRVLCLGDCEVTNSGCSSL — A 

330 340 350 360 370 380 


190 200 210 220 230 240 

TLFL ALTSALLLAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTGAAQEEDACSCRCPGEE 


SLLLANRSLRELDLSNNCVGDPGVLQLLGS 
390 400 410 


LEQP GCALEQLVLYDTYWTEEVEDR 

420 430 440 


250 X 

EGGGGGYEL 

i i a 

• i i 

LQALEGSKPGLRV I S 
450 X 
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10. ELLIS-267-3A 

LAM 1 SHUMAN LEUKOCYTE ADHESION MOLECULE- 1 PRECURSOR (LAM-1). 

ID LAM INHUMAN PRELIMINARY » PRTf 372 AA. 

AC PI 5023) 

DT 0 1 -APR- 1 990 < REL. 1 4 , CREATED ) 

DT 01 -APR- 1990 (REL. 14, LAST SEQUENCE UPDATE) 

DT 01 -APR- 1990 (REL. 14, LAST ANNOTATION UPDATE) 

DE LEUKOCYTE ADHESION MOLECULE- 1 PRECURSOR (LAM-1). 

OS HUMAN (HOMO SAPIENS). 

oc eukaryota; metazoa; chordata; vertebrata; tetrapoda; mammalia; 

oc eutheria; primates. 

RN [1] (TONSIL, SEQUENCE FROM N. A. ) 

RA TEDDER T. F. , ISAACS C. M. , ERNST T. J. , DEMETRI G. D. , ADLER D. A. , 

RA DISTECHE C. M. ; 

RL J. EXP. MED. 170:123-133(1989). 

DR EMBL; X16150J HSLYAM1. 

DR PROSITE; PS00022; EGF. 

KW CELL adhesion; glycoprotein; SIGNAL. 

FT SIGNAL 1 28 

FT PROPEP 29 38 

FT CHAIN 39 372 LEUKOCYTE ADHESION MOLECULE- 1. 

FT CARBOHYD 104 104 POTENTIAL. 

FT CARBOHYD 177 177 POTENTIAL. 

FT CARBOHYD 232 232 POTENTIAL. 

FT CARBOHYD 246 246 POTENTIAL. 

FT CARBOHYD 271 271 POTENTIAL. 

SQ SEQUENCE 372 AA; 42313 MW; 724484 CNJ 

Initial Score = 9 Opt i mi zed Score = 49 Significance = 4. 12 

Residue Identity = 22% Matches = 66 Mismatches = 174 

Gaps = 50 Conservative Substitutions = O 

X 1 0 20 30 40 50 

MGNNCYN WV I VLLLVGCEK VG AVQNSCDNCQPGTFCRKYNPYCKSCPPSTFSS I GGQP 

• i » » i i i i , 

11 '■ till i 

AE I EYLEKTLPFSRSYYW I G I RK I GG I WTWVGTNKSLTEEAENWGDGEPNNKKNKEDCVE I Y I KRNKDAGKW 
80 90 100 110 120 130 140 

60 70 80 90 100 110 

N CN I CRVCAGYFRFKKFCSSTHNAEC— EC I EGFHC LGPQC TRCEKDCRPGQELTKQGCKT 

• i I i till i iiii ii ill ii 

• • • i i i i i i i i i i ii ill ii 

NDDACHKLK AALCYT ASCQPWSCSGHGECVE I I NN YT CNCD VG Y YGPQCQFV I QCEPLEAP — ELGTMDC— T 
150 160 170 180 190 200 210 

120 130 140 150 160 170 180 

CSLGTFN— DQNGTGVCRPWTNCSLDGRSVLKTGTTEKDVVCGPPVVSFSPSTT I SV TPEGGPGGH 

■ iiii ii ii iiii ii 

III' > till II || lilt || 

HPLGNFNFNSQCAFSCSEGTN — LTG I EETT CEPFGNWSSPEPTCQV I QCEPLSAPDLG I MNC 

220 230 240 250 260 270 

190 200 210 220 230 240 

SLQVLTLFL ALTS ALLLAL I F I TLLFSVLKW I RKKFPH I FKQPFKKTTG A AQEED ACSCRCP 

i • i i i i ii it i ii it i 

• •iiii •* it i ii ii i 

S— HPLASF— SFTSACTF I CSEGTEL I GKKKT I CESSG I WSNPSP I CQKLDKSFSM I KEGDYNPLF I PVAVMV 
280 290 300 310 320 330 340 

250 X 

QEEEGGGGGYEL 

i I 

i i 


TAFSGLAFI IWLARRLKKGKKS 
350 X 360 



Results tile e 1 1 i s-267— 3a. res made by wendyc on Mon 27 Aug 30 14*56s51-PDT. 


Query sequence being compared! ELLIS-267-3A 

Number o-f sequences searched! 39513 

Number of scores above cutoffs 2415 

Results of the initial comparison of ELLIS-267-3A with! 
Data bank • GenBank 64. 0> all entries 
Data bank • UEMBJ_ 23_64. 0> all entries 

1 OOOOO- 
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S 1000- 





Best Available Copy 


* 


lOO- * 

50- 


* 


10 - 


* 


- ** * 
O 

• • II4IIIII I | | , t I | I 

« i r i » » i i i i i i i | i , , , 

SCORE o: : 103::: 1206 30S 411 514 617 719 822 925 

STDEV -1 9 


PARAMETERS 


Similarity matrix 

Unitary 

K-tuple 


4 

Mismatch penalty 

1 

Joining penalty 


30 

Gap penalty 

1 . oo 

Window size 


32 

Gap size penalty 

0. 33 




Cutoff score 

55 




Randomization group 

0 




Initial scores to save 

20 

A1 ignments to save 

10 


Optimized scores to save 20 

Display context 

10 



SEARCH STATISTICS 


Scores « 


Mean 

Med i an 

Standard Deviation 



33 

33 

13. 33 

Times s 


CPU 


Total Elapsed 



00 s 41 s 02. 99 


00*56 = 42. OO 

Number 

of 

residues « 

49483801 


Number 

of 

sequences searched s 

39513 


Number 

of 

scores above cutoffs 

2415 



The scores below are sorted by initial score. 
Significance is calculated based on initial score. 


A 100% similar sequence to the query sequence was founds 


Init. Opt. 

Length Score Score Sig. Frame 


Sequence Name 


Description 


1 . MUSTC4 1 BB Mouse T-ce 1 1 receptor 4- 1 BB pr 2350 925 925 66. 92 O 

Best Available Copy 

The list of other best scores is* 

Init. Opt. 

Sequence Name Description Length Score Score Sig. Frame 


**** 8 standard deviations above mean #*** 


2. 

HUMCS3 

Human chorionic somatomammotro 2740 

**** 7 standard deviations above mean 

147 

i **** 

401 

8. 55 

0 

3. 

NEULCC 

N. crassa laccase gene. 3’ end. 726 

**** 6 standard deviations above mean 

135 

i #*** 

310 

7. 65 

0 

4. 

MNKHBD 

Spider monkey ( A. geof f roy i > de 

1959 

123 

372 

6. 75 

O 

5. 

TOGTBESP 

T i ck-borne-encepha litis v i r us 

2450 

120 

398 

6. 53 

O 

6. 

SEHCRYAA 1 

Mole rat alpha-A-crystal 1 in ge 

5491 

1 19 

329 

6. 45 

0 

7. 

HUMGHCSA 

Human growth hormone <GH-1 and 

66495 

1 18 

401 

6. 38 

O 

8. 

HUMNRASR 

Human N-ras mRNA and flanking 

2436 

1 17 

395 

6. 30 

0 

9. 

RSNEU 

Rat mRNA for neuraxin 

3418 

1 16 

405 

6. 23 

O 

10. 

RATFAS 

Rat mRNA for fatty acid syntha 

8936 

1 16 

415 

6. 23 

O 

1 1 . 

RATFAST 

Rat fatty acid synthetase mRNA 

2805 

1 15 

414 

6. 15 

0 

12. 

DRETUBB2 

D. melanogaster beta-2 tubulin 1403 

**** 5 standard deviations above mean 

1 13 

396 

6. OO 

O 

13. 

MUSAB32 1 

Mouse MHC A-beta-3/A-beta-2 me 

2689 

1 12 

396 

5. 93 

0 

14. 

PIGUFMR 

Pig uteroferrin mRNA. complete 

1424 

1 10 

403 

5. 78 

0 

15. 

M22618 

Figure 3. Nucleotide sequence 

7253 

108 

396 

5. 63 

0 

16. 

HSHGMCSF 

Human mRNA for granulocyte-mac 

1807 

108 

398 

5. 63 

O 

17. 

HUMCYPMP 

Human 1 iver cytochrome P-450 S 

1576 

108 

353 

5. 63 

0 

18. 

M27685 

Figure 2. The nucleotide seque 

1717 

108 

404 

5. 63 

0 

19. 

MZEZE19B1 

Maize 19 kDa zein mRNA. clone 

852 

108 

288 

5. 63 

O 

20. 

HUMCYPMPA 

Human cytochrome P-450 S-mephe 

1577 

108 

353 

5. 63 

0 


The scores below are sorted by optimized score. 
Significance is calculated based on optimized score. 


A 100% similar sequence to the query sequence was founds 


Init. Opt. 


Sequence Name 

Description 

Length 

Score 

Score 

Sig. Frame 

1. 

MUSTC4 1 BB 

Mouse T-ce 11 receptor 4- IBB pr 

2350 

925 

925 

91. 05 

0 

The list of other 

best scores i s s 










Init. 

Opt. 



Sequence Name 

Description 

Length 

Score 

Score 

Sig. Frame 



**** 5 standard deviations above mean **** 



2. 

MZEPOD 

Maize pyruvate . orthophosphate 

3171 

59 

424 

5. 14 

0 



**** 4 standard deviations above mean **** 



3. 

MUSB3RP 

Mouse band 3-related protein m 

4088 

59 

422 

4. 80 

0 

4. 

RATTPOFR 

Rat thyroid peroxidase <TPOP> 

2777 

62 

420 

4. 46 

O 

5. 

RNTPO 

Rat mRNA for thyroid peroxidas 

3237 

62 

420 

4. 46 

0 

6. 

FLAP 1 M 

Influenza A/nt/60/68 <h3n2>. p 

2341 

57 

419 

4. 29 

O 

' 7. 

ECOORI 

E. col i replication origin (ori 

2675 

67 

418 

4. 12 

O 

8. 

ECOOR I ASN 

E. coli replication origin (ori 

4012 

67 

418 

4. 12 

0 

9. 

RATBAND33E 

Rat band 3 C1-/HC03— exchanger 

4057 

55 

418 

4. 12 

O 

io. 

ECASNA 

E. coli asn-A gene for asparag 

2170 

67 

418 

4. 12 

O 

1 1. 

HUMHBA4 

Human a 1 pha g 1 ob i n ps i —a 1 pha— 1 

12847 

63 

418 

4. 12 

0 

12. 

MUSADAM 

Mouse adenosine deaminase mRNA 

1379 

64 

418 

4. 12 

O 



***# 3 standard deviations above mean **** 



13. 

HUMINSR 

Human insulin receptor mRNA. c 

4723 

72 

417 

3. 94 

0 

14. 

CHKERBBF 

Chicken c-erbB oncogene mRNA a 

6563 

99 

417 

3. 94 

O 

15. 

HAMAPRTG 

Hamster aprt gene for adenine 

3960 

57 

417 

3. 94 

O 

16. 

HUMNCAM 

Human neura 1 cell adhes i on mo 1 

1423 

72 

417 

3. 94 

O 



nui i jl i Nvjrsn 


MIPC1NM ? 


18. 

19. 

20. 


nui i jl i Nvjrsn 

HUMPDGFRA 

FLAPB1AC 

HUMALDC 


nunini i i i isu llll 


1 cusp IU1 


Hurnajg e |^l^^ejl g|^-^der 1 ved growth 
I nf 1 uenza (R/Ma liard/New Yorl</6 


Human aldolase C gene. 



< 

*+i i 


V 

5570 

60 

416 

3. 77 

0 

2341 

56 

416 

3. 77 

0 

4252 

63 

416 

3. 77 

0 


1. ELLIS-267-3A 

MUSTC41BB Mouse T-cel 1 receptor 4- IBB protein mRNA, complete 

LOCUS MUSTC41BB 2350 bp ss-mRNA ROD 15-SEP-1989 

DEFINITION Mouse T— cel 1 receptor 4— IBB protein mRNAi complete cds. 

ACCESSION J04492 
KEYWORDS T— cel 1 receptor. 

SOURCE Mouse (strain C57BL/6) T-lymphocyte cell lines L2 and L3, cDNA to 

mRNA. 

ORGANISM Mus musculus 

Eukaryota; Animal ia; Metazoa; Chordata; Vertebrata; Mammalia; 
Theria; Eutheria; Rodent ia; Myomorpha; Muridae; Murinae; Mus; 
musculus. 

REFERENCE 1 (bases 1 to 2350) 

AUTHORS Kwon , B. S. and We i ssman , S. M. 

TITLE cDNA sequences of two inducible T-cel 1 genes 

JOURNAL Proc. Natl. Acad. Sci. U. S. A. 86, 1963-1967 (1989) 

STANDARD full staff_review 

COMMENT Draft entry and clean copy of sequence for [13 kindly provided by 

B. S. Kwon, 17-MAR- 1989. 

FEATURES Locat i on/Qua 1 i f i ers 

CDS 146. . 916 

/note="4-lBB protein precursor" 
sig_peptide 146. .214 

/note="4— IBB protein signal peptide" 
mat_peptide 215. . 913 

/note="4-lBB protein" 

BASE COUNT 590 a 561 c 589 g 607 t 3 others 

ORIGIN Unreported. 

Initial Score = 925 Optimized Score = 925 Significance = 91.05 

iResidue Identity = 100% Matches = 325 Mismatches = 0 

Gaps = 0 Conservative Substitutions = 0 

X 10 20 30 40 50 60 70 

ATGTCCATGAACTGCTGAGTGGAT AAACAGCACGGGAT ATCTCTGTCT AAAGGAAT ATT ACT ACACCAGGAA 

1 1 ' < 1 1 ' ' 1 1 ' * 1 < i > 1 i i i i ■ < ■ i i t < < < • • i i i t i i i i i i i t < i i i i i t i i i t t i i i . l t i i i i i i i i i i i 

1 1 1 1 1 1 ' ' 1 1 < i 'i'll i < ■ i 1 • ■ i i • i i i i i i i i i i i i < i i i • i i •« i i i i i t t i i i t • i i i i i i i 

ATGTCCATGAACTGCTGAGTGGAT AAACAGCACGGGAT ATCTCTGTCT AAAGGAAT ATT ACT ACACCAGGAA 
X 10 20 30 40 50 60 70 

80 90 100 110 120 130 140 

AAGGACACATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTCCTGTGCATGTGACATTTCGC 

I l t l I i t i t l 1 l l i I I > i t » i i < i i t I I i 

* •• »• •* i«ii i • l > i > i •• i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i | t i i i i i t t i t i i 

A AGG AC AC ATT CG AC AAC AGG A A AGG AGCCTGT C AC AG A AAACC AC AGTGT CCTGTGC ATGTGAC ATTT CGC 
80 90 100 110 120 130 140 

150 160 170 180 190 200 210 

CATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGT 

' i t • < * < < < ' < i 1 i i < • * < i < i i i ■ i i < ■ i i • • t • i i i i i t i i i i i i « i i i « i i i i i i < i i i t t i i i i i « i t 

CATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTAGTGGGCTGTGAGAAGGTGGGAGCCGT 
150 160 170 180 190 200 210 

220 230 240 250 260 270 280 

GCAGAACTCCTGTGAT AACTGTCAGCCTGGT ACTTTCTGCAGAAAAT ACAATCCAGTCTGCAAGAGCTGCCC 

i I • i 1 i i >■ * l I > * i i i • i f i i i ■ i i ■ i i i i t l i i i I i i i l ■ t 1 ■ i i i i ■ i i i i l i i i l t i i i i t i i i ■ i 

itittiiiitiiitiiitiiiiiiiitttitiitiiiiitiiiittiititiiiiitiiiitiiiiiititi 

GCAGAACTCCTGTGAT AACTGTCAGCCTGGT ACTTTCTGCAGAAAAT ACAATCCAGTCTGCAAGAGCTGCCC 
220 230 240 250 260 270 280 

290 300 310 320 330 340 350 360 

TCCAAGT ACCTTCTCCAGCAT AGGTGGACAGCCGAACTGT AACATCTGCAGAGTGTGTGCAGGCT ATTTCAG 


I I I I I I I I I I I < « 


TCCAAGTACCTTCTCCAGCAT/^GTGGACAGCi 
290 300 B^vaila^^p# 


GCCGAACTGTAACATCTGCAGAGTGTGTGCAGGCTATTTCAG 


370 380 390 400 4 1 0 420 430 

GTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTCCATTGCTTGGGGCC 

i l > i i i i i t >• i i i i i i i i i i i i > i i i t i i i i i i i • i i i t i i t i t i i i i t i i i i i i i i i i i t i 

• I 1 I • I »• I I • I I I I I t I I I I I I i I I I I I I I I I I I I I I I I I I I 1 I « I I I I I I I I I I I t I I I I 1 I t I I I I t I I 

GTTCAAGAAGTTTTGCTCCTCTACCCACAACGCGGAGTGTGAGTGCATTGAAGGATTCCATTGCTTGGGGCC 
370 380 390 400 410 420 430 


440 450 480 470 480 490 500 

ACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAACCTGTAG 

• • I I * • * • i i i t i i I l ill i i i » i i i i i » i i i i i i i i i i i i • i i i i i i i i i i i i i | i i i i i i i | i | t | | t | | 

• •<••••>'•< t < i i i ( t i i i i i i i i i i i i i i i i i i i f t • i i i i t t i i i i < i i i i i i t i i i i <« t • << i i i i 

ACAGTGCACCAGATGTGAAAAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTGCAAAACCTGTAG 
440 450 460 470 480 490 500 


510 520 530 540 550 560 570 

CTTGGGAACATTT AATGACCAGAACGGT ACTGGCGTCTGTCGACCCTGGACGAACTGCTCTCT AGACGGAAG 

> I i > * * l l l > l > l i l * l l • l l l l l l I I I l l l l l i l I i > t t I I t t l l I I I l l l ■ l • > t l l I l l l l i t l t I l l l 

• ••••« • * »» i i « » ill l l i i i i i i i i ■ i i » i i t i i » i i i i i i i i i i i i i i i i i i i t i i i i i i t i t i i i i » i 

CTTGGGAACATTT AATGACCAGAACGGT ACTGGCGTCTGTCGACCCTGGACGAACTGCTCTCT AGACGGAAG 
510 520 530 540 550 560 570 

580 590 600 6 1 0 620 630 640 

GTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAG 

• i • I • » » till l i l l i i < i i i i i » i i i i i i i i i i i i i (i i < i i i i i i i i i i i | | | • i | | | i | | | | | , , | | , , 

< • » • ' i • • * • • • ' • ■ i > « i i < i i l < • i i i < i i i i i i i < i i i < i i < t i i < i < t i i i i < i i i < i i i i i i i i i i i 

GTCTGTGCTTAAGACCGGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAG 
580 590 600 610 620 630 640 

650 660 670 680 690 700 7 1 0 720 

T ACC ACC ATTTCTGTG ACT CC AG AGGG AGG ACCAGGAGGGCACT CCTTGC AGGT CCTT ACCTTGTT CCTGGC 

' * * 1 1 1 * 1 > ' * * I I I I I I > ■ t I I I I ■ t I I I I I i I I I t I I ! I I I I I I > t ! I I » I I I I I I I I I « t I t I I I I I 

• » • • * • » I • I » I I t I « I I I I | | | f | | | | | | | | | | | | | | | | | | | | | | | | | | | | , , | , , , , , , , , , , , , , , , , , 

T ACC ACCATTT CTGTG ACT CC AG AGGG AGG ACCAGGAGGGCACT CCTTGC AGGT CCTT ACCTTGTT CCTGGC 
650 660 670 680 690 700 710 720 

730 740 750 760 770 780 790 

GCTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAA 

• • • • • « l i i i i • i • i i i » i it i i i • i i i i t i i i i i i i ii i i i i i i i i i i i i i i| | | i | | | i | ( , | , , , , , , , 

i • t < ■ i i • • i < • ■ i i i i t i i i t < < • i i i > i « t t t i i i « < i i ■ i < < i t t < i i i i i i i i ■ i i i i ■ i < i i t i i i i 

GCTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAA 
730 740 750 760 770 780 790 

800 810 820 830 840 850 860 

AAAATTCCCCCACAT ATTCAAGCAACCATTT AAGAAGACCACTGGAGCAGCTCAAGAGGAAGATGCTTGT AG 

> I I I I i I 1 I I * < I t > I I !>• 1 I I I I I 1 I I I I I I I > I 1 I I I I I I I I I I I I I I | | | I | ! I I I I | | I t I I I I I I I 

» • • I I I t I I I I I I I II I III | | | | | | | | | | | | | | | | | | | , | | | | | | > | | | • | | | | | | | | , | , , , , , , , , , , , 

AAAATTCCCCCACAT ATTCAAGCAACCATTT AAGAAGACCACTGGAGCAGCTCAAGAGGAAGATGCTTGT AG 
800 810 820 830 840 850 860 

870 880 890 900 910 920 X 

CTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAGGCT ATGAGCTGTGATGT ACT ATC 

t • l I • i •< < ( ■ i • t • i • I i t l I i i i i i i i I I < i i i i t l i I i i i i i i i t i i i I i i i i i i i i i 

< i • i • i < •< < i i • i i •• i i t i i i i i < i i i i < i i • • i < i i i i i i i i i i i i i t • i • • i ■ i i i i 

CTGCCGATGTCCACAGGAAGAAGAAGGAGGAGGAGGAGGCT ATGAGCTGTGATGT ACT ATCCT AGGAGATG 
870 880 890 900 9 1 0 920 X 930 


2. ELLIS-267-3 A 

MZEPOD Maize pyruvate . orthophosphate di kinase mRNA * compl 

LOCUS MZEPOD 3171 bp ss— mRNA PLN 30-SEP-1988 

DEFINITION Maize pyruvate . orthophosphate di kinase mRNA. complete cds. 

ACCESSION J 03901 

KEYWORDS pyruvate* orthophosphate di kinase. 

SOURCE Maize (strain Golden Cross Bantam) green leaf* cDNA to mRNA. clones 

pPPD C 7 1 , 10673. 

ORGANISM Zea mays 

Eukaryota * P 1 antae ; Embryob i onta * Magno 1 i ophy ta ; L i 1 i ops i da ; 

Comme 1 i n i dae ; Cypera 1 es ; Poaceae ; Zea ; mays. 

1 (bases 1 to 3171 > 


REFERENCE 



TITLE 


JOURNAL 

STANDARD 

COMMENT 

FEATURES 

CDS 

mRNA 

BASE COUNT 
ORIGIN 


Primary pyruvate .orthophosphate di kinase as 

deduced from cDNA sequence 

J. Biol. Chem. 263 , 1 1 080- 1 1 083 < 1 S88 ) 

full staff_entry 

Draft entry and printed copy of sequence for [1] kindly provided by 
M. Matsuoka. 18-MAY- 1988. 

Locat i on/Qua 1 i f i ers 
114. .2957 

/note=" pyruvate, orthophosphate di kinase (EC 2.7.9. 1)" 

< 1. . 3171 

/note= " PODK mRNA" 

691 a 852 c 971 g 657 t 
1 bp upstream of EcoRI site. 


Initial Score 
Gaps 


59 Optimized Score = 424 

51;%4 Matches = 521 

132 Conservative Substitutions 


S i gn i f i cance 
Mismatches 


5. 14 
349 
0 


X 10 20 30 40 50 

ATGT CC ATGA- ACTGC-TG A-GTGG AT — AAACAGCACGGGAT ATCTCTGTCT AAAGGAAT A 

11 i i i I I I i 111 II i i i i i i i i i i i i i i i 1 i 

' * i i t i i i i ill it i t i i i i t i i i l i i i t i i 

AGGGAGAGCCATTCCCCTCAGACCCCAAGAAGCAGCTGGAGCTAGCA— GTGCTGGCTGTGT— TCAACTCGTG 
910 920 930 940 950 960 970 


60 70 

TT ACT AC ACC AGG — 


-AA- 


80 90 100 110 120 

-AAGGACACATTCGACAA— CAGGAAAGGAGCCTGTCACAGAAAACCACAGTGTC 


GGAGAGCCCCAGGGCCAAGAAGTACAGGAGCATCAACCAGATCACTGGCC — TCA- 
980 990 1000 1010 1020 1030 


GGGGCACCGCCGTGAA 

1040 


130 140 150 160 170 180 

CTGTGCATGTGACAT TTCG CCATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCT 


C— GTGCA— GTG— CATGGTGTTCGGCAACATGGGGAACACTTCTGGCACCGGCGTGCTC— TTCACCAGGAACC 
1050 1060 1070 1080 1090 1100 lllO 


1 90 200 2 1 0 220 230 240 250 

AGTGGGCTGTGAGAAGGTGGGAGCCG — TGCAGAACTCCTGTGAT AACTGTCAGCCTG— GT A— CTTTCTGCA 

* * ilii i i ill i ii ii iiiii i ill till i< i i i ill 

* ' till i i ill i ii ii iiiii i ill till ii i i i ill 

CCAACACCG— GAGA — GAAGAAGCTGTATGGCGAGTTCCTG— GTGAACGCTCAGGGTGAGGATGTGGTTGCC 
1120 1130 1140 1150 1160 1170 1180 


260 270 280 290 

G AAA AT ACAAT CC AGTCTGC A AG AGCTGCCCT CC 


300 

•AAGT ACCTTCT— CCA- 


310 320 

— GCAT AGGTGGA — CAGC 


GGAAT AAGAACCC CAG-AGGACCTTGACGCCATGAAGAACCTCATGCCACAGGCCTACGACGAGCTTGT 

1190 1200 1210 1220 1230 1240 1250 


330 340 350 360 370 380 

CG — AACTGT AACAT— CTGCAGAGTGTGTGCAGGCT ATTTCAGGTTCAAGAAGTTTTGCTCCTCT ACCCACA 

• IIIII II I III ! I II I I | till I | III I II II | | | 

I IIIII IIIII III till I II | I | till I I III | II II | I • 

TGAGAACTGCAACATCCTGGAGAGCCACTACAAGGAAATGCAGGAT— ATCGAGT— TCACT— GTCCAGGAAAA 
1260 1270 1280 1290 1300 1310 1320 


390 400 410 420 430 440 450 

— ACGCGGAGTGTGAGTGCATTGAAGGA— TTCCATTGCTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTG 

l • ' • • ii i i i i i i i i i i i i i i i i i i i i i i iiiii ii i 

» t • • « ii i i i i i i < i i l l i l i i l i i i i i l iiiii ii i 

CAGGCTGTGGATG— TTGCAGTGCAGGACAGGGAAACGTACGGGCAAAAGTGC CGTGAA GATCG 

1330 1340 1350 1360 1370 1380 

460 470 480 490 500 510 520 

CAGGCCTGGCCAGGAGCTAACGA-AGCAGGGTTGCAAAACCTGT-AGCTTGGGAACATTTAATGACCAG 

• i i i i i i i i i i i i i i ii iiiii it till lilt i iiii 

• I I 1 I I I I I I I I I I I II IIIII || llll IIII I IIII 

C CGTGGACATG— GTTAACGAGGGCCTTGTTG— AGCCCCGCTCAGC— GATCAAGATGGTAGAGCCAGGCC 


530 540 Best /jL^able Copy 560 570 580 530 

AACGGT ACTGGCGTCTGTCGACC— CTG GACGAACTGCTC — TCTAGACGGA — AGGT CTGTGCTT A AG A 


ACCTGGACCAGCTTCT— TCATCCTCAGTTTGA— GAACCCGTCGGCGTACAAGGATCAAGTCATTGCCACTGG 
1450 1460 1470 1480 1400 1500 1510 


600 6 1 0 620 630 640 650 

CCGGGACCA — CGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGA — GCTTCTCTCCCAGTACCA — CCA 

• • ill l i ii i i i i i i i i i l i i i i i ill i l i i i ii 

ii ill i • ii i i i i i i i i i i i i i i i ill i i i i i ii 

TCTG — CCAGCCTCACCTGGGGCTGCTGTG— GGCCAGGTTGTGTTCACTGCTGAAGATGCTGAAGCATGGCA 
1520 1530 1540 1550 1560 1570 1580 


660 670 680 630 700 710 

TTTC — TGTGA CT-CCAG — AGGGAGGACCAGGAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTG— G 

III III III! till till! II III llllll II III 

• II III till I I I I I I I | I II III llllll II III 

TTCCCAAGGGAAAGCTGCT ATTCTGGT AAGGGC— GGAGACCAGCCCT — GAGGACGTT — GGTG-GCATGCA 
1530 1600 1610 1620 1630 1640 1650 

720 730 740 750 760 770 780 

CGCTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATTACTCTCCTGTTCT-CTGTGCTCAAA TGGATC 

lilt' III ii III II I I l I I I I I i I I I I I I I I I l III 

i i • i • til ii ill ii i i i i i i i i i i i i i i i i i i i ill 

CGCTGCTGTGGGGATTCTTACAGAGAGGGGT— GGCATGACT— TCCCACGCTGCTGTGGTCGCACGTTGGTGG 
1660 1670 1680 1630 1700 1710 1720 

730 800 810 820 830 840 850 

AGG A A A A A ATT C — CCCCACAT ATTCAAGCAACCATTT AAGAAGACCACTGGAGCAGCTCAAGAGGA — AGA 


ii it 


GG6A A ATGCTGCGT CTCGGGATGCT C AGGC ATT CGCGT AA- ACGA — TGCGGAGAAGCTCGTGACGATCGGA 
1730 1740 1750 1760 1770 1780 1730 


860 870 880 830 300 310 320 

TG— CTTGTAGCTGC— CGATGTCCACAGGAAGAAGAAGGAGGAGGAGGA — GGCTA — TGAGCTGTGATGT AC 

■ • ill i iiiiii it i ii i i i i i i i ii ii i iill ii it 

• • < • ' i • • i i i i i t i ii i i i i i i i ii ii i iiii ii ii 

AGCCATGT-GCTGCGCGAAGGTGAGTGGCTGTCGCTGAATG-GGTCGACTGGTGAGGTGATCCTTG-GGAAG 


1800 


1810 


1820 


1830 


1840 


1850 


1860 


X 

TATC 


CAGCCGCTTTCCCC 

1870 


3. ELLIS-267-3 A 

MUSB3RP Mouse band 3-related protein mRNA. complete cds. 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

COMMENT 


MUSB3RP 4088 bp ss-mRNA ROD 15-MAR- 1383 

Mouse band 3-related protein mRNA, complete cds. 

J 04036 

band 3-related protein. 

Mouse (strain CD-I ; adult) kidney and 70Z/3/3 pre-B cell line. cDNA 
to mRNA i clone p70ZN8. 

Mus musculus 

Eukaryota; Animal ia; Metazoa; Chordata; Vertebrata; Mammalia; 
Theria; Eutheria; Rodent ia; Myomorpha; Muridae; Murinae; Mus; 
musculus. 

1 (bases 1 to 4088) 

Alper.S. L. . Kopito.R. R. » Libresco.S. M. and Lodish.H. F. 

Cloning and characterization of a murine band 3-related cDNA from 
kidney and from a lymphoid cell line 
J. Biol. Chem. 263, 17032-17033 (1388) 

full staff_entry 

Draft entry and computer— readable sequence for Cll kindly provided 
by S. Alper, 1 6-SEPT- 1 388. 


CDS 183. . 3896 

Pft^Wil0&lfi£q3iy re 

lated protein" 

repeat_reg i on 3882. . 3938 

/note=" degenerate tandem repeat copy A" 
repeat_reg i on 3939. . 4001 

/note=" degenerate tandem repeat copy B" 
BASE COUNT 842 a 1 1 86 c 1 2 1 1 g 849 t 

ORIGIN 106 bp upstream o+ Xbal site. 


Initial 

Score = 

59 

Res i due 

Identity = 

50% 

Gaps 

= 

124 


X 

io 


Optimized Score = 422 

Matches = 504 

Conservative Substitutions 


S i gn i f i cance 
Mismatches 


4. 80 
368 
0 


20 30 40 50 

ATGTCCATGAACTGCTGAGTGGAT AAACAGCACGGGAT ATCTCTG — TCT AAAGGA— AT ATT 


CACCTCGGGCACGGCCACGGGCCCC- 
UOO 1110 1120 


-GCATAAGCCCCATGAGGTGTTTGTGGAGCTGAATGAGCTGCT 
1130 1140 1150 1160 


60 70 80 90 100 110 120 

ACT ACACCAGGAAAAGGA — C AC ATT CG AC A AC AGGA A AGG AGCCTGT C AC AG A A A ACC AC AGTGT CCTGTG 


GTTGGACAAAAACCAGGAGCCTCAGTGG 

1170 1180 1190 


-CGGGAGA— CAGCCCGCTGGATAAAATTCGAGGAGGATGTG 
1200 1210 1220 


130 140 150 160 170 180 190 

CATG— TGACATTTCGCCATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTG-CTGCTAGTGGGCTGT 

l' till it Itii I II i il titli i i i i i i i | ii 

»» III! II lltl I It | || lllll I I It II | | || 

GAAGAGGAGACTGAGCGCTGGGGGAAGCCTCATGTGGCCTCACTGTCCTTCCGTAGCCTCCTGGAGCTCCGC 
1 230 1 240 1 250 1 260 1 270 1 280 1 290 1 300 

200 2 1 0 220 230 240 250 260 

GAGAAGGTGGGAGCCGTGCAGAACTCCTGTGATAACT — GTCAGCCTGGT ACTTTCTGCAGAAAAT ACAATC 

• • til i i i I i i i i t i i i I i < i i i i i ii ill i i i i i 

ii lit i i i i i i i i i • i i t i i i i i i i ii ill i till 

AGGACTCTGG — CCCATGGAGCTGTGCTCTTAGACCTCGATCAG— CAGACCCTGCCTG — GGGTGGCCCATC 
1310 1320 1330 1340 1350 1360 


270 280 290 300 310 320 330 

CAGTCTGCAAG-AGCTGCCCTCCAAGT ACQTTCTCCA— GCAT AGGTGGACAGCCGAACTGT AAC ATCTG 

11 l i i i I l l i i ill i l i i i I l i I i l i i i i ii il ii 

i l i I l l l 1 i til I I 1 i i i i i i l i l i i i it it ii 

AGGTGGT CGAGCAGATGGT CAT CTCTG ACC AGAT CAAGGCAG AGG— AT AG AGCC A ATGTGCT ACGGGCCCT C 
1370 1380 1390 1400 1410 1420 1430 


340 350 360 370 380 390 400 

CAGAGTGTGTGCAGGCT ATTTCAGGTTCAAGAAGTTTTGCTCCTCT ACCCACAAC GCGGAGTGT — G 

1 • • iiiii it iiiii i i • i i i ill lit till ii 

I' i iiiii it iiiii i i i i i i ill ill i i i i ii 


CTGCTAAAGCACA-GCCACCCAAGTGACGAGAAAGAGTTCTCCTTCCCCCGAAACATCTCAGCGGGCTCTCT 
1 440 1 450 1 460 1 470 1 480 1 490 1 500 1510 


410 420 430 440 450 460 

AG TGCATTGAAGGAT— TCCATTGCTTGGGG — CCACAGTGCACCAGATGTGAAAAGGACT-GCAGGCC 

• » ' ill ill till i iiii i i i i l t il iiiii ill ii 

• • • ill ill iiii t iiii i i i i i i ii iiiii til it 

AGGCTCTCTACTGGGGCATCACCATGCCCAGGGGACCGAGAGTGATCCTCATGTCACTGAGCCTCTCATTGG 
1520 1530 1540 1550 1560 1570 1580 


470 480 490 500 510 520 530 

TGG CCAG— GAGCT AAC— GAAGCAGGGTTGCAAAACCTGT AGCTTGGGAACATTT AATGACCAGAACGG 


TGGTGTTCCTGAGACCCGACTGGAGGTGGATAG— AGAGCGTG-AGCTACCACCCCCAGCACCACCTGCA— GG 
1590 1600 1610 1620 1630 1640 1650 


540 550 560 570 580 590 600 

T ACTGGCGTCT— GTCGACCCTGGACGAACTG— CTCT— CT AGACGGAAGGTCTGTGCTT AAGACCGGGACCAC 

>*> ' • i til i i i t t i i i i i i i t t • i i i t ii ii i i t • i i 

• • • ' i ' ill i i i i i i i i • i i i i t t t i i i i« i i i i i i i i 

T ATT ACCCGCTCCAAGTCCAAGCATGAGCTGAAGCTGCTGGA — GAAGATCCCTG— AGAATGCGGAGGCT AC 


X 


X uuv 


X f wv 


X < X SJ 


610 ^gstAvailab^py 640 GSO 66Q 

GGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCAT TTCTGTGACTC 


AG TGGTCCTCGTGGG- 

1730 


-CTGTGTGGAGTTCCTCTCCC— GCCCTACCATGGCCTTC— GTG-CGG 
1740 1750 1760 1770 


670 680 690 700 710 720 730 

CAGAGGGAGGACCAGGAG— GGCACTCCTTGCAGGTCCTTACCTTGTTCCTG— GCGCTGACATCGGCTTTGCT 

1 I t I I I I III! | | || III I | III II I I I I I I I I I I II | I I I I I 

1 I I I I I I till | | || III | | III II I I I I I I I I I I II I I I I I I 

TTGCGGG AGGCTGTGG AGCTGG ATGCCGTGCT AG— AGGTGCCT— GTGCCTGTGCGCT -TCCT CTT C— TTGCT 
1780 1790 1800 1810 1820 1830 1840 

740 750 760 770 780 790 800 

GCTGG — CCCTGATCTTCATTACTCTCCTGTTCT— CTGTGCTCAAATGGATCAGGAAAAAATTCC— CCCACA 

* » » » ' till • i I i • ill ill ii i I i i l i i l i i i i i i i I 

1 ' ' 1 ‘ i i i i i i i i i ill ill ii i i i i i i i i i i i i i i i i 

GCTGGGACCCAG CAGTGCT AACATGGACT ACCATG AGATCGGCC — GCTCCATTTCCACCCTCA 

1850 1860 1870 1880 1890 1900 

810 820 830 840 850 860 870 

TAT TCAAGCAA — CCATTT AAGAAGACCA-CTGGAGCA— GCTCAAGAGGAAGATGCTTGT AGCTGCCGA 


i i i l i i i II i i i 

i i i i i i i ii i i i 


TGTCTGACAAGCAATTTCA— TGAGGCAGCCTACCTGGCGGATGAACGAGACG— ACTTGCTGACTGCTATCAA 
1910 1920 1930 1940 1950 1960 1970 


880 

TGTC— CACAGGA— AGAAG- 


890 900 910 920 X 

-AAGG AGGAGG AGG A— GGCT ATG AGCTGTG ATGT ACT AT C 


TGCCTTCCTGGACTGCAGTGTTGTGCTACCGCCTTCTGAAGTGCAGGGCGAGGAGCTGCTGCGTTCTGTTGC 
1 980 1 990 2000 20 1 0 2020 2030 2040 2050 


CCATTTCC 


4. ELLIS-267-3 A 

RATTPOFR Rat thyroid peroxidase <TP0P> mRNA » 3’ end. 

LOCUS RATTPOFR 2777 bp ss— mRNA ROD 15-JUN-1990 

DEFINITION Rat thyroid peroxidase <TPOP> mRNA, 3’ end. 

ACCESSION M3 1655 

KEYWORDS thyroid peroxidase. 

SOURCE Rat thyroid cell line FRTL-5, cDNA to mRNA. 

ORGANISM Rattus norvegicus 

Eukaryota; Animal ia; Metazoa; Chordata; Vertebrata; Mammalia; 
Theria; Eutheria; Rodent ia; Myomorpha; Muridae; Murinae; Rattus; 
norveg i cus. 

REFERENCE 1 < bases 1 to 2777) 

AUTHORS I sozak i , 0. , Kohn , L. D. , Kozak , C. A. and K i mura , S. 

TITLE Thyroid peroxidases Rat cDNA sequence, chromosomal localization in 

mouse, and regulation of gene expression by comparison to 
thyroglobul in in rat FRTL— 5 cells 
JOURNAL Mo 1 . Endocr i no 1 . 3 , 1681-1 692 ( 1 989 ) 

STANDARD simple staff _entry 
FEATURES Locat i on/Qua 1 i f i ers 

CDS < 1. . 2313 

/note=" thyroid peroxidase" 

BASE COUNT 690 a 752 c 722 g 613 t 

ORIGIN 


Initial 
Res i due 
Gaps 

Score = 

Identity = 

62 

51% 

137 

Optimized Score = 420 

Matches = 515 

Conservative Substitutions 

Significance = 
Mismatches = 

ss 

4. 46 
347 
0 


X 

10 

20 30 

40 50 



II II 


I I I 


II I I till 


I t I 


I I I 


» • • n' Va ' 'i ui' r 1 ‘ • ' » » » iiii i i i i ii i i i it • , 

ACAGACGCTCAGAGGCAGgf8PftTO ll &SftAftl$$ATTCACTACCTCGGGTCATCTGTGACAACACCGGCCTCAC 
1570 1580 1590 1600 1610 1620 1630 


60 70 80 

T ACT ACACCAGGAAAAG GACACATTCGA- 


90 100 110 

-CAACAGGA — AAGGAGCCTGTCACAGAAAACCACA 


iiii 


CAGAGT ACCTGTGGATGCCTTCCGT ATTGGAAAGTTCCCCCAGGACTTTGAATCCTGTGA-GGAAATCCCT A 
1640 1650 1660 1670 1680 1690 1700 

120 130 140 150 160 170 180 

GTGTCCTGTGCATGTGACATTTCGCCATGGGAAACAACTGTTACAACGTGGTGGTCATTGTG— CTGC T 

1 III II I I I i l t i i l i i i it iiii i i i i i i i i i t 

• ill ii i i i i i i i i i ■ i i ii iiii t i i t i i i i i i 

G CATGGACCTCAGAC-TGTGG AGGGAGAC — CT — TCCCACAAGACGACAAGTGTGTCTTCCCAGA 

1710 1720 1730 1740 1750 1760 

190 200 210 220 230 240 250 

GCT AGTGGGCTGTGAGAA— GGTGGGAGCCGTGCAGAACTCCTGTGAT AACTGTCAGCCTGGT ACTTTCTGCA 

1 IIII l II III II i i III lilt ti ii it i III i ill 

1 iiii i ii til ii i i ill iiii it ii ii i ill t i i i i i til 

GAAGGTGGACAATGGGAACTTTGTGCACTGTGAAGAA— TC— TGGGA— AGCTGGTA — CTGGTGT ATTCCTGT 
1770 1780 1790 1800 1810 1820 1830 

260 270 280 290 300 310 

GAAAAT ACAATCCAGT CTGCAAGAGCTGCCCTCCAAGT— ACCT— T CT CCAGCAT AGGT— GGAC — AG 

• • I* >1 i i t • i t i l i ii ii ii i i iiii ii i i iiii ii 

’ 1 'I *' I 1 I i I i l i i II ll ii I I till ii I I till Ii 

TTCCAT— GGAT ACAAGCTGCAAGGCCAG — GAGCAGGTCACATGTACCCAGAATGGATGGGACTCAGAGCCT 
1840 1850 1860 1870 1880 1890 1900 

320 330 340 350 360 370 380 390 

CCGAACTGT AACATCTG— CAGAGTGTGTGCAGGCT ATTTCAGGTTCAAGAAGTTTTGCTCCTCT ACCCACAA 

*l i l I i l l i ll I l i i i i i l l 1 ll ll Ii i iiii ill l i l i i 

if i i i i i i i ii i i i < i i i i i i ii i i ii i i ill ill ii ill 

CCTGTCTGT AA-AGATGTT AATGAGTGTGCAG AT — CTGACACACCCACCT-TGCCACTC-CTCCGCAA 

1910 1920 1930 1940 1950 1960 

400 4 1 0 420 430 440 450 

CGCGGAGTGTGAGTGCATTGAAGGA TTCCATTGCTTGGGGCCACAGTGCACCAGATG — TGAAAAGGA— 


AGTGC A AG A AC ACCA AGGG A AGCTTCC AGTG — TGTGTGCACAGACCCCT ACATGCT AGGTGAGGAT 

1970 1980 1990 2000 2010 2020 2030 


460 470 480 490 500 510 520 

CTGCAGGCCTGGCCAGGAGCT AACGAAGCAGGGTTGCAAAACCTGT AGCTTGGGAACATTT AATG— AC — CA 

• ' * • • i I • i ii iiii ll i i i i t ii iiii iiii i 

• * • i • i i •• <i iiii ll i i i i i ii till i i i i i 

GAGAAGACCTGCATAGATTCTGGC— AGGCTACCTCGGGCATCCTGGGTCTCCATTGCATTGGGTGCACTTCT 
2040 2050 2060 2070 2080 2090 2100 


530 540 550 560 570 580 590 

GAACGGT ACTGGCGTCTGTCGACCCTGGACGAACTGCTCTCT AGACGGA— AGGT— CTGTGCTT AAGACC 

* III i i i i i i i i i i i i i ii ii iiii iiii i i i i i 

i lit i i i i i i i i i i i i i ii ii iiii iiii i i t i i 

C ATTGGTGGTTTGGCC AGT CT C AGCTGG AC — TGT AATTTGCAGGTGGACACATGCTGAT AAGAAGTCCACA 
2110 2120 2130 2140 2150 2160 2170 


600 610 620 630 640 650 

GGGACCACGGAGA— AG-GA— CGTGGTGTGTGGACCCCCTGTG-GTGAGCTTCTCTCCCAGT — ACCAC- 


i i i 


< l l 


i i i i i 


TTGCTGATCACCGAGAGAGTGACCATGGAGTCAGGATTCAGAAAGAGTCAG— GAGAGTGGGATTTCACCACA 
2180 2190 2200 2210 2220 2230 2240 

660 670 680 690 700 710 720 

CATTTCTGTGACTCCAGAGGGAGGACCAGGA — GGGC— ACTCCTTGCAG— GTCCTT ACCT — TGTTCCTGGCG 

1 III i i I I I l I I I I I I I lit I l i . 1 i i | i i i I I I l I I II 

' * • * •• til i ii iitit ill i t iii iiiii ill ill i i 

AAAGGCCGAGGTTCAAGA— TGCTGAACAGGAACCGGCTTATGGATCCAGAGTCCT — CCTGTGTG A AT AGAA 
2250 2260 2270 2280 2290 2300 2310 


730 


740 


750 


760 


770 


780 


790 


o i mnon i \ i i i i uimi/i/O i Uh i L/ 1 i Oh i i mo 1 L/ I L-u iui I u l u l u l iju i Uhhh I uuih I UMUiUH 

< i • i i i i i it iii i i i i i i i i i i it i i i i iiii ii iii 

GTCCTCACTGCTTTGGAGS^^^WIilS^AATTCAAGTCTCAAGCTGCCTGGG — CAAA — GA — AAGA 


2320 


2330 


2340 


2350 


2360 


2370 


2380 


800 8 1 0 820 830 840 850 860 

AAAAATTCCCCCACAT ATTCAAG-CAACCATTT AAGAAGACCA-CTGGAGCAGCTCAAGAGGAAGATGCT — T 

• <1 i i i t i t l i l i l i I 1 i I i l i i ill l t l l II till 

i ii i i i i i i i i i i i i i i i i i i i i iii iiii it iiii 

CATGAT ACATGTTGAAGTCAGAGGCTTGAGGACACCAGATGGTT AATCTT ATCAGTCCAAGGCTGC 

2390 2400 2410 2420 2430 2440 


870 880 890 900 910 920 

GTAGCTGCCGATGTCCA CAGGAAGAAGAAGGAGGAGGAGG — AGGCT ATGAGCTGTG ATGTACTAT 

I I I < I II IIII II | I 1 I I I I III IIII I II I I 1 I I III II I 

I I • < • II IIII II I I I I I I I III IIII I II I I I I I III II I 

ATAGCT GAGTTCCATCTCATGTTTTTCCA-CAGGAGCAGGCCAGGCCA— GA— CTGTGCTAATG— CCTCT 

2450 2460 2470 2480 2490 2500 2510 


X 

C 


CCT ACACAAGT 
X 2520 


5. ELLIS-267-3 A 

RNTP0 Rat mRNA tor thyroid peroxidase 

ID RNTPO standard; RNA ; ROD; 3237 BP. 

XX 

AC XI 7396; M27275; 

XX 

DT 05— JAN— 1990 (annotation) 

XX 

DE Rat mRNA tor thyroid peroxidase 
XX 

KW thyroid peroxidase. 

XX 

OS Rattus norvegicus (rat) 

OC Eukaryota; Metazoa; Chordata; Vertebrata; Tetrapoda; Mammalia; 

OC Eutheria; Rodent i a. 

XX 

RN C1I (bases 1-3237) 

RA Rapoport B. ; 

RT ; 

RL Submitted ( 22-AUG- 1 989 ) on tape to the EMBL Data Library. 

XX 

RN [2] (bases 1-3237) 

RA Derwahl M. . Seto P. * Rapoport B. ; 

RT "Complete nucleotide sequence o-f the cDNA -for thyroid peroxidase 
RT in FRTL5 rat thyroid cells"; 

RL Nucleic Acids Res. 17 * 8330-8330 ( 1989). 

XX 

CC ^sources cell line=FRTL5. 

XX 

FH Key From To Description 

FH 

FT CDS 42 2783 thyroid peroxidase ( AA 1-914) 

XX 

SB Sequence 3237 BP; 816 A; 874 C; 831 G; 716 T; O other; 

Initial Score = 62 Optimized Score = 420 Significance = 4.46 

Residue Identity = 51% Matches = 515 Mismatches = 347 

Gaps = 137 Conservative Substitutions = 0 

X 1 0 20 30 40 50 

ATGT CC ATG A ACTGCTG AGTGGAT A A AC— AGCACGGG AT AT CT CTGT CT AAA — GGAAT-AT 


ACAGACGCTCAGAGGCAGGAACTG-GAAAeiGCATTCACTACCTCGGGTCATCTGTGACAACACCGGCCTCAC 
2040 2050 2Sg6 tAvailable id^y 2080 2090 2100 2110 


60 70 80 

T ACT ACACCAGGAAAAG GACACATTCGA- 


90 100 110 

-CAACAGGA — AAGGAGCCTGTCACAGAAAACCACA 


i i i i ■ 


CAGAGTACCTGTGGATGCCTTCCGTATTGGAAAGTTCCCCCAGGACTTTGAATCCTGTGA— GGAAATCCCTA 


2120 


2130 


2140 


2150 


2160 


2170 


2180 


120 


130 


140 


150 


160 


170 


180 


GTGTCCTGTGCATGTGACATTTCGCCATGGGAAACAACTGTTACAACGTGGTGGTCATTGTG— CTGC T 


G C ATGG ACCT CAG AC— TGTGG AGGGAGAC — CT — TCCCACAAGACGACAAGTGTGTCTTCCCAGA 


2190 


2200 


2210 


2220 


2230 


2240 


190 200 210 220 230 240 250 

GCT AGTGGGCTGTGAGAA— GGTGGGAGCCGTGCAGAACTCCTGTGAT AACTGTCAGCCTGGT ACTTTCTGCA 


GAAGGTGGACAATGGGAACTTTGTGCACTGTGAAGAA— TC— TGGGA— AGCTGGTA — CTGGTGT ATTCCTGT 
2250 2260 2270 2280 2290 2300 


260 270 280 290 300 3 1 0 

GAAAATACAATCCAGTCTGCAAGAGCTGCCCTCCAAGT— ACCT— TCTCCAGCATAGGT— GGAC — AG 


1 I I ■ II I 


TTCCAT— GGAT ACAAGCTGCAAGGCCAG — GAGCAGGTCACATGTACCCAGAATGGATGGGACTCAGAGCCT 
2310 2320 2330 2340 2350 2360 2370 

320 330 340 350 360 370 380 390 

CCGAACTGTAACATCTG— CAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAA 

1 < • I I I I I i II I I I I I I i I I l ll II II I till III I I I I I 

i« 1 »i • i i i i i • i i i ii < i ii i i iii lit ii tit 

CCTGTCTGTAA— AGATGTTAATGAGTGTGCAG AT — CTGACACACCCACCT-TGCCACTC-CTCCGCAA 

2380 2390 2400 2410 2420 2430 2440 


400 4 1 0 420 430 440 450 

CGCGGAGTGTGAGTGCATTGAAGGA TTCCATTGCTTGGGGCCACAGTGCACCAGATG — TGAAAAGGA— 


i i i i 


-AGTGCAAGAACACCAAGGGAAGCTTCCAGTG — TGTGTGCAC AG ACCCCT ACATGCT AGGTG AGG AT 
2450 2460 2470 2480 2490 2500 


460 470 480 490 500 5 1 0 520 

CTGCAGGCCTGGCCAGGAGCT AACGAAGCAGGGTTGCAAAACCTGT AGCTTGGGAACATTT AATG-AC — CA 

* • l l • l l II II I i I I II I i I I I ll i i i I i i i i ■ 

•••'••i •• ii i i i * ii iiiii ii till t i ii i 

GAGAAGACCTGCATAGATTCTGGC— AGGCTACCTCGGGCATCCTGGGTCTCCATTGCATTGGGTGCACTTCT 
2510 2520 2530 2540 2550 2560 2570 

530 540 550 560 570 580 590 

GAACGGTACTGGCGTCTGTCGACCCTGGACGAACTGCTCTCTAGACGGA— AGGT— CTGTGCTTAAGACC 

■ 111 1 IIIII i l I I l i l ii ll till l i i l iiiii 

• iii i iiiii i i i i i i i ii ti i i i i i i i i iiiii 

C ATTGGTGGTTTGGCC AGT CT C AGCTGG AC — TGT AATTTGCAGGTGGACACATGCTGAT AAGAAGTCCACA 
2580 2590 2600 2610 2620 2630 2640 

600 610 620 630 640 650 

GGGACCACGGAGA— AG— GA— CGTGGTGTGTGGACCCCCTGTG— GTGAGCTTCTCTCCCAGT — ACCAC- 

ii iii iiii ii ii t iii ii iii i i ii ii i i | iiitr 

i ' i i • • i i * • • > i i i i i i i iii i iiiii i ii iiiii 

TTGCTGATCACCGAGAGAGTGACCATGGAGTCAGGATTCAGAAAGAGTCAG— GAGAGTGGGATTTCACCACA 
2650 2660 2670 2680 2690 2700 2710 

660 670 680 690 700 7 1 0 720 

CATTTCTGTGACTCCAGAGGGAGGACCAGGA — GGGC— ACTCCTTGCAG— GTCCTTACCT— TGTTCCTGGCG 

• III Iiiii i i i i i i i i iii i i iii iiiii iii i i • i i 

i • • i ii iii i ii iiiii iii i i iii iiiii iii iii i i 

AAAGGCCGAGGTTCAAGA— TGCTGAACAGGAACCGGCTTATGGATCCAGAGTCCT — CCTGTGTGAAT AGAA 
2720 2730 2740 2750 2760 2770 2780 


730 740 750 760 770 780 790 

CTGACATCGGCTTTGCTGCTGGCCCTGATCTTCATT — ACT CTCCT GTTCT CTGTGCTC A A ATGG AT C AGGA 



GTCCTCACTGCTTTGGAGCCAGACATTGGC-TAATTCAAGTCTCAAGCTGCCTGGG — CAAA — GA — AAGA 
2790 2800 gg^tyWailable 2830 2840 2850 


800 810 820 830 840 850 8G0 

AAAAATTCCCCCACAT ATTCAAG-CAACCATTT AAGAAGACCA-CTGGAGCAGCTCAAGAGGAAGATGCT — T 


CATGAT ACATGTTGAAGTCAGAGGCTTGAGGACACCAGATGGTT AATCTT ATCAGTCCAAGGCTGC 

2860 2870 2880 2890 2900 29 1 0 


870 880 890 900 9 1 0 920 

GTAGCTGCCGATGTCCA CAGGAAGAAGAAGGAGGAGGAGG — AGGCT ATGAGCTGTG ATGT ACT AT 

• • I 1 • till it t i l < i i i lit till i it i i i i i ill ii l 

• • • 1 * ■> till ii i i lilt* ill i i i • i ii ttiit ill ii i 

ATAGCT GAGTTCCATCTCATGTTTTTCCA-CAGGAGCAGGCCAGGCCA— GA— CTGTGCTAATG— CCTCT 

2920 2930 2940 2950 2960 2970 2980 


X 

C 

CCT AC ACAGT A 
X 2990 


6. ELLIS-267-3 A 

FLAP 1 M Influenza A/nt/60/68 <h3n2>, polymerase 1 ( seg 2). 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

COMMENT 


FEATURES 

CDS 


FLAP 1 M 2341 bp SS-RNA VRL 30-JUN-1987 

Influenza A/nt/60/68 (h3n2), polymerase 1 < seg 2). cdna. 

J02138 

RNA polymerase; polymerase, 
i nf 1 uenza f rom human. 

Influenza virus type A 

Viridae; ss-RNA enveloped viruses; Negative strand RNA viruses; 
Orthomyxoviridae; Inf luenzavirus; Influenza A viruses; Influenza 
virus type A. 

1 (bases 1 to 2341) 

B i shop , D. H. L. , Huddleston , J. A. and Brown 1 ee , G. G. 

the complete sequence of rna segment 2 of influenza a/nt/60/68 and 
its encoded pi protein 

Nucleic Acids Res. 10, 1335-1343 (1982) 
full staf f__review 

Sequence derived from cloned cDNA ( a/nt/60/2/68/ 1 962 ) ; bases 
518-1693 also obtained independently with separate cloned cdna 
(371). First 12 and last 13 bases questionable. Assignment of 
coding region by consideration of open reading frames. 

Locat i on/0ua 1 i f i ers 
25. . 2298 


/note= " po 1 ymerase 1" 
unsure 1644 

/note="g in clone 371 ; a in clone a/nt/60/2/68/ 1962" 
BASE COUNT 827 a 460 c 530 g 524 t 

ORIGIN 3’ end of vrna. 


Initial Score = 
Residue Identity = 
Gaps = 


57 Opt i m i zed Score = 419 S i gn i f i cance = 4. 29 
51% Matches = 516 Mismatches = 349 
135 Conservative Substitutions = O 


X 10 20 30 

ATGTCCATGAACTGCTGAGTGGATAAACAGCACG 


40 50 

GGAT ATCTCTGTC— T AAAGGAA 

• ii il till il 

i i l ii l i l i il 


G A AC ACAT C A AT ATT CAG AA A AAGGGA AGTGGACA ACA AAC ACGGA AACTGGAGCGCCCCA ACTT A ACCC AA 


160 

170 

180 

190 

200 

210 

220 


60 

70 

80 

90 

lOO 

1 io 


T AT— T ACT AC— ACC AGGAAAAGGA— CACATTCGACA— ACAGGAAAGGAGCCTGTCACAGAAAACCACA 



I fuim wiwir twwr iw » nwu t uinwun I mn I unuounnu t UUM I M I iJL/nuriMnOMmnU 1 KJi I LH I 0~*0 l L3U»MHU!U»MH i *J3 


230 


240 


§est Available^C^opy 


270 


280 


290 


300 


120 130 140 150 160 170 180 190 

G-TGTCCTGTGCATGTGACATTTCGCCATGGGAAACAACTGTTACAACGTGGTGGTCATTGTGCTGCTGCTA 

• 1 ii»i »i i ti II i ii III i ii i i III i it lit i«i i ii i 

• • i i • 1 ii i li ii l '» i ill i ii i i iti i it lit lit i ii i 

GCTTTCCT— TGAA G A— AT CCC ACCC AGGG ATCTTTGAAAAC— TCGT— GTC— TTGAAACGATGGAA 

310 320 330 340 350 360 

200 210 220 230 240 250 

GTGG GCTGTGAGAAG— GTGG— GAGCCGTGCAGAACTCCTGTGATAACTGTCAGCCTGGTACTTTCTGCAG 

1,1 l i l i l i i l ill i ll l i i l l i l i I l l l i i i i ill 

1,1 ill ill I ll I i i l l i l I l i i l l l l l ill 

GTTGTTCAACAAACAAGGGTGGACAGACTGACCCAAGGTC-GTCAGACCTAT — G ATTGG— AC ATT A A AC AG 
370 380 390 400 410 420 

260 270 280 290 300 310 320 

A A A AT AC A AT CCAGT CTGC AAG AGCTGCCCTCC A AGT ACCTTCTCCAGCAT AGGTGG ACAG-CCGA A — CTG 

! ! ! !!!!!!!!!!' • • • » • • » • • » * • • » • • « • « • * i i 

• 1 * * i • i i i i i i i i iii i ii ,i iii till iii i i i i i 

AAA — T C A A— CCGG— CCGC A A CTACATT AG — CCAACACT A— T AGAAGTCTTCAGATCGAATGGTC 

430 440 450 460 470 480 

330 340 350 360 370 380 390 

T AACATCTGCAGAGTGTGTGCAGGCT ATTTCAGGTTCAAGAAGTTTTGCT — CCTCT ACCCAC AACGCGGAG 

• • • • * 11 • « « l l i 1 l I l i i i i i i i i l i I I i III l I i I l l i 

• 1 • 1 • * ' i i i i i i i ■ i t < i i i ii i i i ■ i i i iri i i i i i i i 

T AACAGCT AATGAGT— CGGGAAGGCT AAT AGATTTCCTCAAAGATGTGATGGAATCAATGGAT AA AGAGGAA 
490 500 510 520 530 540 550 

400 410 420 430 440 450 460 

TGTG AGTGC ATTGAAGGATT CC ATTG — CTTGGGGCCACAGTGCACCAGATGTGAAAAGGACTG — CAGGC— 

1,1 ’ 1 > l l l l I l I l I l I I l l I l ll III ll I 

' 1 1 ' 1 lllil i l l I l I l l I l l l ll III ll i 

ATGG AG AT A AC A AC AC ACTT CCA A AG A A A A AG A AG AGT A AG AG- AC A AC ATGACCA AG A A A ATGGTC ACACA 
560 570 580 590 600 6 1 0 620 

470 480 490 500 510 520 

CTGGCCAGGAGCT A — ACGAAGCAG— GGTTG — CAAAACCTG— T AGCT — TGGGAACATTT A— AT — GACCAG 

1 11 11 ' » i l i i i i i i ll III i i ii ii l ll i i i i i ii i| || 

• • • • • » ■ i i i i i i i i i i iii i i ii i i i ii i i i i i ii ii ii 

AAGAACAAT AGGAAAGAAGAAGCAGAGAGTGAACAAGAGAAGCT ATCT AAT AAGAGCATT A AC ATTG A AC AC 
630 640 650 660 670 680 690 700 

530 540 550 560 570 580 590 

AACGGT ACTGGCGTCTGTCGACCCTGGACGAACTGCTCTCT AGACGGAAGGTCT— GTGCTT AAGA— CCGGGA 

• * • 11 • < > it III i III till ll ill i i i i i i t I I 

111 11 1 11 11 11 lit i lit till it til i i i | | | | | | 

AATG — ACCAAAG— ATGCAGAAAGAGG — TAAAT TAAAGA — GAAGAGCT ATTGC — AACACCCGGGA 

710 720 730 740 750 760 


600 610 620 630 640 650 

— CCACGGAGAAGG— ACGTG — GTGTGTGGACCCCCT— GTGGTGAGCTTCTCTCCCAGTACC ACCATTT 


i i i I 


TGCAAATCAGAGGGTTCGTGT ACTTTGTTGAAACTCT AGCT AGGAGCATTTGT — GAGAAGCTTGAACA— GT 
770 780 790 800 810 820 


660 670 680 690 700 7 1 0 720 730 

CTGTGAC— TCCAGAGGGAGGACCAGGAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTGGCGCTGACATCG 

• » • « « 1 i i « i i ii ii ii ill i i lilt till • 

i • i i i > > i i > i i i i i i ii ii ii iii i i i i i i lilt i 

CTG— GACTT CCAGTTGG AGGT AATGA AA AGA AGGCCA AACTGG — CAAATGTTG— TGAGAAAGATGATGACT 
830 840 850 860 870 880 890 


740 750 760 

GCTTTGCTGCTGGCCCTGATCTTCATTACTC- 


ll ll 


770 780 790 

TCCTGTTCTCTGTGCTCAAATGGATCAGGAAA AAA 

• ill l i ll i i i i i i i i i i i i ill 

■ lit l i il i i i i i i i i iiii iii 


AATT — CACAAGACACAGAGCTTTCTTTCACAATT ACTGGAGACAAT ACT— AAATGGA — ATGAAAATCAAA 
900 910 920 930 940 950 960 


800 810 820 830 840 850 

TTCCCCCACAT ATT— CAAGCAACCATTT AAGA— AGACCACTGGAGC— AGCTCAA GAGGAAGATGCT— 

III III III ll l l l l t i I till III 

>•••••••• i iii til ill ti i i i • i i i iiii lit 


i i i ww » wuii i i i i w» i i ww » uiuiwuin » un i i nwm t n i n i wn\-/nnnnnn * i ynn i mui i » ynuinnnL'ui I l L» I □ 


Best Ava^aSle Copy 


1000 


1010 


1020 


860 870 880 890 900 910 920 

TG — TAGCTGCC — GATGTCCACAGGAAGAAGAAGGAG-GAGGAGG AGGCT ATGAGCTGTGATGT ACT A 

< ill it i i i I i i i i i i l i i l i i i ill i i l i i ii i i i i I 

l ill it i i i i i i i i i i i i i i i i i ill i i i i i ii till i 

AGCATCGCACCCAT AATGTTCTCA — AACAAAATGGCGAGACT AGGGA AAGGAT ACATGTTCGAAAGT AAGA 
1040 1050 1060 1070 1080 1090 1100 


X 

TC 


GCATGAAGCTCC 
1 1 10 


7. ELLIS-267-3A 

EC00RI E. coli replication origin (oriC) and asnA gene cod 

LOCUS ECOORI 2675 bp ds-DNA BCT 15-SEP-1989 

DEFINITION E. coli replication origin (oriC) and asnA gene coding tor 
asparagine synthetase A. 

ACCESS ION J 0 1 657 X02820 

KEYWORDS asnA gene; asparagine synthetase; oriC gene; origin of replication; 

synthetase; unidentified reading frame. 

SOURCE Escherichia coli K12 DNA. 

ORGANISM Escherichia coli 

Prokaryota; Bacteria; Graci 1 i cutes ; Scotobacter i a ; Facultatively 
anaerobic rods; Enterobacter iaceae; Escherichia; coli. 

REFERENCE 1 (bases 1 to 1105) 

AUTHORS Sugimoto.K. . Oka. A. . Sugisaki.H. . Takanami »M. » Nishimura.A. . 
Yasuda.Y. and Hirota.Y. 

TITLE Nucleotide sequence of Escherichia coli K-12 replication origin 
JOURNAL Proc. Natl. Acad. Sci. U. S. A. 76, 575-579 (1979) 

STANDARD full staff_review 
REFERENCE 2 (bases 57 to 575) 

AUTHORS Meijer.M. , Beck.E. , Hansen, F. G. , Bergmans , H. E. N. , Messer, W. , Von 
Meyenburg , K. and Scha 1 1 er , H. 

TITLE Nucleotide sequence of the origin of replication of the Escherichia 

coli K-12 chromosome 

JOURNAL Proc. Natl. Acad. Sci. U. S. A. 76, 580-584 (1979) 

STANDARD full staff_review 
REFERENCE 3 (bases 506 to 2675) 

AUTHORS Nakamura, M. , Yamada.M. , Hirota.Y. , Sugimoto.K. , Oka, A. and 

Takanami , M. 

TITLE Nucleotide sequence of the asnA gene coding for asparagine 

synthetase of E. col i K-12 
JOURNAL Nuc 1 e i c Acids Res. 9 , 4669-4676 (1981) 

STANDARD ful 1 staf f_review 
REFERENCE 4 (bases 6 to 527; 834 to 906) 

AUTHORS Lother.H. and Messer, W. 

TITLE Promoters in the E. coli replication origin 

JOURNAL Nature 294, 376-378 (1981) 

STANDARD full staff_review 
REFERENCE 5 (bases 126 to 395) 

AUTHORS Oka, A. , Sugimoto.K. , Sasaki, H. and Takanami, M. 

TITLE An in vitro method generating base substitutions in preselected 

regions of plasmid DNA • Application to structural analysis of the 
replication origin of the Escherichia coli K-12 chromosome 
JOURNAL Gene 19, 59-69 ( 1982) 

STANDARD full staff_review 
REFERENCE 6 

AUTHORS Matsu i ,M. , Oka, A. , Takanami, M. , Yasuda.S. and Hirota.Y. 

TITLE Sites of dnaA protein-binding in the replication origin of the 

Escherichia coli K— 12 chromosome 
J. Mol. Biol. 184, 529-533 (1985) 


JOURNAL 


COMMENT C6] sites; dnaA binding sites. 

Directly c5fi^H^^xea' < yy authors through Dr. Ooi o-f Kyoto Univ. The 
422 bp region 106-527 contains ori (origin o-f replication), whose 
probable le-ft end is between 128 and 140, and whose probable right 
end is 371 or 372. C51 reports many mutants that provided evidence 

that ori contains special regions, spacer sequences, which separate 
neighboring recognition sites. 


FEATURES 

Locat i on/Qua 1 i f i ers 



CDS 

comp 1 ement ( < 1. . 17) 





/note="putat i ve 21k protein" 



CDS 

comp 1 ement ( 396. . 839 ) 
/note="putat ive 16l< protein" 



CDS 

1539. . 2531 
/note= " asparag i ne 
/gene= "asnA " 

synthetase A <asnA)" 



m i sc_RNA 

comp 1 ement < < 1 . . 27 1 ) 




/note= "p oriL RNA 

t ranscr i p t i on " 



m i sc_RNA 

418 

/note="p oriR RNA 

transcription (alt. )" 



m i sc_RNA 

428 

/note="p oriR RNA 

transcription (alt. >" 



m i sc_b i nd i ng 

182. . 197 





/note="dnaA major 
529-533 ( 1 985 ) ] " 

binding site A [J. Mol. 

Biol. 

184, 

m i sc_b i nd i ng 

237. . 252 

/note="dnaA minor 
529-533 <1985)1" 

binding site XI [ J. Mol. 

Biol. 

184, 

m i sc_b i nd i ng 

288. . 303 

/note=="dnaA major 
529-533 <1985)]" 

binding site B [J. Mol. 

Biol. 

184, 

m i sc_b i nd i ng 

323. . 338 

/note="dnaA minor 
529-533 <1985)]" 

binding site X2 C J. Mol. 

Biol. 

184, 

m i sc_b i nd i ng 

362. . 377 

/note- "dnaA major 
529-533 <1985)]" 

binding site C CJ. Mol. 

Biol. 

1— ‘ 
CD 
A 

con-f 1 ict 

rep 1 ace < 1 05. . 1 05 » " 
/ci tation=C2] 

a" ) 



conf 1 ict 

rep 1 ace < 105. . 105. " 
/ci tat ion=[4] 

'a" ) 



con-f 1 ict 

rep lace <543. . 545. " 
/ci tation=C2] 

1 ac " ) 




BASE COUNT 
ORIGIN 


635 a 


626 c 


725 g 


689 t 


Initial Score 
Residue Identity 
Gaps 


1 bp upstream of BamHI site. 

= 67 Optimized Score 


51% Matches = 508 Mismatches 

130 Conservative Substitutions 


418 Si gn i -f i cance = 4. 1 2 

= 358 

= 0 


X 1 0 20 30 40 50 

ATGTCCATGAACTGCTGAGT — GGAT AAACA— GCACGGGAT ATCTC— TGTCT AAAGG — AAT 

• •I till i l i t i i ill i i i til i l i l i it i 

ill iiii i i i i i i tti i i i lit i i i i i ii i 

TTCTTTTTT AATG — AATCAAAAG— TGAGTT AGGCTTTTT ATTGAATGATT ATTGCATGTGTGTCGGTTTTT 
1430 1440 1450 1460 1470 1480 1490 

60 70 80 90 100 110 120 

ATTACT ACACCAGGAAAAGGACAC ATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTG 

i i i • ii i i i i i i i i i i ii i l i i I l i i ill li i i i i i ii 

iiii ii i i i i i i i i i ■ ii i i i i i i i i ill ii i i t i i ii 

. GTTGCTT AATCAT AAGCAACAGGACGCAGGAGT AT AAAAAATGAAAACCG— CTT ACATTGCCAAACAACGTC 
1500 1510 1520 1530 1540 1550 1560 

130 140 150 160 170 180 190 

TCCTGTGCAT— GTGACATTTCGCCATGGGAAACAACTGTTACAACG — TGGTGGTCATTGTGCTGCTGCT AG 


IIII II 


AAATTAGCTTCGTGAAATCTCACTTTTCTCGTCAACTGGAAGAACGTCTGGGGCTGATCGAAGTCCAG— GCG 


200 2 pgst Availably 230 240 250 

TGEGCTGTGAGAAG-STSGGAGCCSTGCAGAACTCCTGTGATAAC-TSTCAGCCTS-GTflCTTTC — TGCAG 


<111 II 


I III I 


CCGATTCTTAGCCGTGTGGGGGATG-GC ACGC — AGGATAACTTGTCGGGCTGTGAAAAAGCGGTGCAG 

1650 1660 1670 1680 1690 1700 

260 270 280 290 300 310 320 

— AAAAT ACAATCCAGTCTGCAAGAGCTGCCC TCCAA GT ACCTTCTCCAGCAT AGGTGGACA-GCC 


ill i 


iiii 


GT AA A AGTG A A AGC — TCTGCCTGA — TGCCCAGTTCGAAGTGGTTCATTCACTGGC— GAAGTGGAAACGTC 
1710 1720 1730 1740 1750 1760 1770 

330 340 350 360 370 380 

-GAACT GTAACA — TCTGCAGAGTGTGTGCAGGCT ATTTCAGGTTCAAGAAGTTTTGCTCCTCT ACCC 


II II 


i lilt 


ll ill 


AGACCTT AGGGCAACACGACTTCAGCGCGGGCGAAGGGCTGT ACACGCACATGAA AGC-CCT-TCGCC 

1780 1790 1800 1810 1820 1830 

390 400 4 1 0 420 430 440 450 

ACAACGCGGA — GTGTGAGTGCATTGAA GGATTCCATTGCTTGGGGCCACAGTGCACCAG — ATGTGAA 

• i i ll ill i i i i i i ll i lit ii i i i I l i i till 

I*' ll ill i i i i i i ii i ill ii i t i i i i i i i i i 

CCGATGAAGACCGTCTTTCTCCGTTGCACTCGGTCTATGTTGACCAGTGGGACTGGGAACGCGTAATGGGCG 
1840 1850 1860 1870 1880 1890 1900 1910 

460 470 480 490 500 5 1 0 520 

AAGGACTGCAGGCCTGGCCAGGAGCT AACGAAGCAGGGTTG— CAAAACCTGT AGCTTGGGAACATTT AATGA 

iiii i i ill i i i i i i i i i till ii iiii i i i i i i 

'll till I i ill i i i i i i i i i iiii ii iiii i i i i i i 

ACGGTGAGC— GTCAATTCTCGACTCTGA— AAAGCACGGTAGAGGCGATCTG— GGC — GGGA — ATTAAAGCA 
1920 1930 1940 1950 1960 1970 

530 540 550 560 570 580 590 

CCAG A ACGGT ACTGGCGT CTGT CG ACCCTGG AC — GA ACTGCT CTCT AG ACGG AAGGTCTGTGCTT AAG ACC 

• i 1 > • i • ' ' • ll iiii i i i t i i i i i i i i i ill i ii 

' i ' i ' ' i > > i ll iiii i i t i i i i i i i i i i ill i it 

ACCG A A-GCTGC-GGTT AGCG A AG AGTTTGGCCTGGCACCGTTC-CT-GCCGG A TCAGATC CACT 

1980 1990 2000 2010 2020 2030 

600 610 620 630 640 650 660 

GGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTTCTGTG 


TCGT ACACAG— CCAGGAGTT ACTGT- 
2040 2050 2060 


-CTCGTT ATCCGGATCTT-GATGCCA — AAGGGCGTGAGCG— G 
2070 2080 2090 


670 680 690 700 710 720 730 

ACTCCAGAGGGAGGACCAGGAG-GGCACTCCTTGCAGGTCCTTACCTTGTTCCTG-GCGCT-GACATCGGCT 

' III II i i i I 1 I i I i i i l I ll ll i I i 1 I i i i i l l i I i i 

* I' ' 'I ' • ' • II • llllll ll II | III III I I till! I 

GCGATAGCGAAAGATCTTGGCGCGGTATTCCTTGTCGG-GATTGGCGGCAAGCTGAGCGATGGTCATCGCCA 
2100 2110 2120 2130 2140 2150 2160 2170 

740 750 760 770 780 790 800 

TTGC— TGCTGGC— CCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCC 

•'ll i i i I i i i iiii ll i i i i i i i lit ill i | 

iiii i i i i i i i iiii ii i • i i i i i ill ill i i 

CGACGTGCGCGCACCGGATT ATGATGA — CTGGAGCACCCCGT— CAGAGCTGGGCCATGCGGGTCTGAACGG 
2180 2190 2200 2210 2220 2230 

810 820 830 840 850 860 

ACATATTC AAGCAACCATTT A — AGAAGA— CCACTGGAGC — AGCTCAAGAGGAAGATGCTTGTAGCT 

• • ' * 1 • * iiii ii i l i i i i i iiii ill i l i ill i i i i i i i 

1 * 1 > • ' • i t i i ii i i l l i i l iiii ill i it ill i i i i i i i 

CGAT ATTCTGGTGTGGAACCCGGT ACTGGAAGATGCGTTTGAGCTTTCCTCCATGGG — GATCCGTGTAGAT 
2240 2250 2260 2270 2280 2290 2300 

870 880 890 900 9 1 O 920 X 

GCCGA TG TCCACAGGAAGAAGAAGGAGGAGGAGGA— GGCT ATG AGCTGTG ATGT ACT ATC 

'''I' '* ll iiii l ll l ii ii li ii iiilltlili ill 

• 1 i » ' •• i i l i i i i ii i i i l i i i ii i i i l l i l i i i ill 

GCCGACACGCTGAAGCATCAACTGG— CGCTGACCGGTGACGAAGATCGCCTGGAGCTG— GA— GTGGCATCAG 


Best Available Copy 


8. ELLIS-267-3 A 

ECOORIASN E. coli replication origin (oriC) and asnA gene cod 

LOCUS ECOORIASN 4012 bp ds-DNA BCT 15-SEP-1989 

DEFINITION E. coli replication origin (oriC) and asnA gene coding tor 
asparagine synthetase A. 

ACCESSION KOOS26 

KEYWORDS asnA gene; asparagine synthetase; m i n i chromosome ; oriC gene; 
origin of replication; synthetase. 

SOURCE Escher ichia colii cl one (mini chromosome ) pCM959 9 DNA. 

ORGANISM Escherichia coli 

Prokaryota; Bacteria; Graci 1 i cutes ; Scotobacter ia; Facultatively 
anaerobic rods; Enterobacter iaceae ; Escherichia; coli. 

REFERENCE 1 (bases 1 to 4012) 

AUTHORS Buhk,H. -J. and Messer 9 W. 

TITLE The replication origin region 0+ Escherichia coli: nucleotide 

sequence and functional units 
JOURNAL Gene 24, 265-279 ( 1983) 

STANDARD simple staff_review 

COMMENT P 1 asm i d pCM959 was obta i ned in v i vo ; it i s a sma 11 c i rcu 1 ar 

mini chromosome containing only E. col i chromosomal DNA. The 
ci rcu lari sat ion point is at 1; this sequence represents the 
complete pCM959 sequence. 

FEATURES Locat i on/Qua 1 i f i er s 

CDS comp 1 ement ( < 1 . . 589 ) 

/note="21K protein" 

CDS comp 1 ement ( 968. . 1411) 

/note= " 1 6K prote i n " 

CDS comp 1 ement < 1501. . 1959) 

/note= " 1 7K prote i n " 

CDS 21 11.. 3103 

/note= " asparag i ne synthetase A" 

/gene= " asnA " 

BASE COUNT 947 a 1025 c 1052 g 988 t 

ORIGIN 207 bp upstream of Bglll site. 


Initial 

Residue 

Gaps 

Score = 

Identity = 

67 

51% 

130 

Optimized Score = 418 

Matches «= 508 

Conservative Substitutions 

S i gn i f i cance = 
Mismatches = 

4. 12 
358 
0 


X 

10 

20 30 

40 50 



ATGT CCATGAACTGCTG AGT — GG AT A A ACA— GCACGGGAT AT CT C— TGT CT AA AGG — AAT 

III l l t l 1 1 1 1 1 I ill 1 1 l ill I 1 i 1 1 11 t 

ill till tiiiii 111 1 1 1 lit 1 1 1 1 1 11 1 

TTCTTTTTT AATG — AATCAAAAG— TGAGTT AGGCTTTTT ATTGAATGATT ATTGCATGTGTGTCGGTTTTT 
2010 X 2020 2030 2040 2050 2060 2070 

60 70 80 90 IOO HO 1 20 

ATTACT ACACCAGGAAAAGGACAC ATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTG 

till it 1 1 1 1 i 1 1 1 1 1 11 iiiiiiii ill 11 1 1111 11 

1 1 1 i it 1 1 1 1 1 1 < 1 l 1 11 iiiiiiii ill t l 1 1 1 1 1 11 

GTTGCTT AAT CAT AAGCA ACAGG ACGC AGG AGT AT AA A A A ATGA A A ACCG-CTT AC ATTGCCA A AC AACGT C 
2080 2090 2100 2110 2120 2130 2140 

130 140 150 160 170 180 190 

TCCTGTGCAT— GTGACATTTCGCCATGGGAAACAACTGTTACAACG — TGGTGGTCATTGTGCTGCTGCT AG 

1 l l l 1 1 1 1 1 l l I 1 l l I • l 1 I 1 1 1 I 1 iiiiiiii III l 

1 1 • 1 1 1 1 1 1 1 1 l 1 1 i 1 t • 1 1 1 1 1 1 1 iiiiiiii 111 1 

AAATTAGCTTCGTGAAATCTCACTTTTCTCGTCAACTGGAAGAACGTCTGGGGCTGATCGAAGTCCAG-GCG 
2150 2160 2170 2180 2190 2200 2210 

200 210 220 230 240 250 

TGGGCTGTGAGAAG— GTGGGAGCCGTGCAGAACTCCTGTGATAAC— TGTCAGCCTG— GTACTTTC — TGCAG 


GCGCTGCT 

2380 



< I t I I I I I I I I I III I I I I I I I I 

GC — AGGATAACTTGTCGGGCTGTGAAAAAGCGGTGCAG 
2250 2260 2270 


260 270 280 290 300 3 1 0 320 

— AAA AT ACAATCCAGTCTGCAAGAGCTGCCC TCCAA GT ACCTTCTCCAGCAT AGGTGGAC A-GCC 

• • 1 ' * 1 * l i i • i it i I i i i ll it it i lit i it l i i i i i i i i 

• » * 1 1 * • t i i i i il i t i i t ll it ii i iii i ti i t i i i i i i i 

GT AAAAGTGAAAGC — TCTGCCTGA — TGCCCAGTTCGAAGTGGTTCATTCACTGGC-GAAGTGGAAACGTC 
2280 2290 2300 2310 2320 2330 2340 


-GAACT- 

it it 
il i« 


330 340 350 360 370 380 

-GTAACA — TCTGCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCC 


AGACCTT AGGGCAACACGACTTCAGCGCGGGCGAAGGGCTGT ACACGC ACATGAA AGC-CCT-TCGCC 

2350 2360 2370 2380 2390 2400 2410 


390 400 410 420 430 440 450 

ACAACGCGGA — GTGTGAGTGCATTGAA GGATTCCATTGCTTGGGGCCACAGTGCACCAG — ATGTGAA 

• * * • • ill 1 i i i l I ii i lit il i i i i i i t i • i i 

lit il III i i i i i i it i ill it i i i i i i i till 

CCGATG A AG ACCGT CTTTCT CCGTTGC ACT CGGT CT ATGTTGACC AGTGGG ACTGGGAACGCGT A ATGGGCG 
2420 2430 2440 2450 2460 2470 2480 


460 470 480 490 500 510 520 

A AGG ACTGC AGGCCTGGCC AGGAGCT A ACGAAGC AGGGTTG— C A AAACCTGT AGCTTGGG A AC ATTT A ATG A 


ACGGTGAGC- 

2490 


•GTCAATTCTCGACTCTGA— AAAGCACGGTAGAGGCGATCTG-GGC — 
2500 2510 2520 2530 


■GGGA — ATTAAAGCA 
2540 


530 540 550 560 570 580 590 

CCAGAACGGTACTGGCGTCTGTCGACCCTGGAC — GAACTGCTCTCT AGACGGAAGGTCTGTGCTT AAGACC 



ACCGAA— GCTGC— GGTTAGCGAAGAGTTTGGCCTGGCACCGTTC— CT— GCCGGA TCAGATC CACT 

2550 2560 2570 2580 2590 2600 


600 610 620 630 640 650 660 

GGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTTCTGTG 

I till I I I I I III II II I I I I I III! I || l| 

• till I I t I I III II It I I I I I lilt | II II 

TCGT ACACAG— CCAGGAGTT ACTGT CTCGTT ATCCGGATCTT— GATGCCA — AAGGGCGTGAGCG— G 

2610 2620 2630 2640 2650 2660 2670 

670 680 690 700 710 720 730 

ACTCCAGAGGGAGGACCAGGAG— GGCACTCCTTGCAGGTCCTTACCTTGTTCCTG— GCGCT— GACATCGGCT 

* l*i ll » * * l l ■ ■ * l < ■ l t ll It l I ■ i I I i > i i i | i i | 

» ' ' • • • • ' ' • it i i t i i i i ii li i III ill i i i i • i i • 

GCGATAGCGAAAGATCTTGGCGCGGTATTCCTTGTCGG— GATTGGCGGCAAGCTGAGCGATGGTCATCGCCA 
2680 2690 2700 2710 2720 2730 2740 


740 750 760 770 780 790 800 

TTGC— TGCTGGC— CCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCC 

• 1 * * 1 l i t t l I l t i l II i I I i I I I ill III I i 

' ill i i i i i i t till il i i t i l i i iii iii i i 

CGACGTGCGCGCACCGGATT ATGATGA — CTGGAGCACCCCGT— CAGAGCTGGGCCATGCGGGTCTGAACGG 
2750 2760 2770 2780 2790 2800 2810 


810 820 830 840 850 860 

ACATATTC AAGCAACCATTT A — AGAAGA— CCACTGGAGC — AGCTCAAGAGGAAGATGCTTGT AGCT 

’ 1 1 1 1 1 1 till ll lilt! I | till 111 t ll III I | | | | | t 

• » 1 > 1 » i i i i i ii i i i i i i i till til i ii ill i t i i i i i 


CGAT ATTCTGGTGTGGAACCCGGT ACTGGAAGATGCGTTTGAGCTTTCCTCCATGGG — GATCCGTGTAGAT 
2820 2830 2840 2850 2860 2870 2880 


870 880 890 900 910 920 X 

GCCGA TG TCCACAGGAAGAAGAAGGAGGAGGAGGA— GGCT ATG AGCTGTG ATGT ACT ATC 


GCCGACACGCTGAAGCATCAACTGG— CGCTGACCGGTGACGAAGATCGCCTGGAGCTG— GA— GTGGCATCAG 
2890 2900 2910 2920 2930 2940 2950 


GCGCTGCT 



9 . 


ELLIS-267-3 A 
RATBAND33E 


Best Available Copy 

Rat band 3 C1-/HC03- exchanger ( B3RP2 ) mRNA , 


compl 


LOCUS 

DEFINITION 

ACCESSION 

KEYWORDS 

SOURCE 

ORGANISM 


REFERENCE 

AUTHORS 

TITLE 

JOURNAL 

STANDARD 

FEATURES 

CDS 

mRNA 

BASE COUNT 
ORIGIN 


RATBAND33E 4057 bp SS— mRNA ROD 15-JUN-1990 

Rat band 3 Cl— /HC03— exchanger (B3RP2) mRNA. complete cds. 

JOS 1 66 

3 C1-/HC03- exchanger. 

Rat stomach, cDNA to mRNA, clones RSAEI2-1 ,3-1 ]. 

Rattus norvegicus 

Eukaryota; Animal ia; Metazoa; Chordata; Vertebrata; Mammalia; 
Theria; Eutheria; Rodent ia; Myomorpha; Muridae; Murinae; Rattus; 
norvegicus. 

1 (bases 1 to 4057) 

Kudrycki , K. E. , Newman, P. R. and Shul 1 ,G. E. 

cDNA cloning and tissue distribution of mRNAs -for two proteins that 
are related to the band 3 C1-/HC03- exchanger 
J. Biol. Chem. 265, 462-471 (1990) 
simple staf +_entry 

Locat i on/Qua 1 i -f i ers 
201. . 3905 

/not e= " C 1 -/HC03- ex changer ( B3RP2 > “ 

( 1. . 4057 

/note= " B3RP2 mRNA" 

830 a 1179 c 1205 g 843 t 


Initial Score = 
Residue Identity =» 
Gaps = 


50% 

137 


Optimized Score = 418 Significance = 
Matches = 508 Mismatches = 
Conservative Substitutions = 


4. 12 
362 
0 


X 1 0 20 30 40 50 

ATGTC CATGAAC— TGCTGAGTGGAT AAACAGCACGGGAT ATCTCTGT CTAAAGGA 


CTGGCCCCACACCTCGGGCACGACCACGGGCCCCCCATAAGCCTCATGAGGTGT— TC— GTAGAGCTGAATGA 
1110 1120 1130 1140 1150 1160 1170 


60 70 80 90 lOO HO 120 

AT ATT A— CT ACACCAGGAAAAGGA — CACATTCGACAACAGGAAAGGAGCCTGTCACAGAAAACCACAGTGT 

I » ' * III I I I I I I I I I I I I I t I till! I I I I I It 

>1 * ■ III I I I I I I I I I I K III I 111 * | | till || 

ATTGCAGTTGGACAAAAACCAGGAGCCTCAGTGG CGGGAGA-CAGCCCGGTGGATAAAATTTGAGGAG 

1180 1190 1200 1210 1220 1230 1240 


130 140 150 160 170 180 190 

CCTGTGCATG— TGACATTTCGCCATGGGAAACAACTGTTAC — AACGTGGTGGTCATTGTGCTG— CTGCT AG 

• 'll* till ii i i i i ill III ii i i i i i i i I • i • i i 

l>lll lilt II till III Ilf II I I I I I I I I I I I I I 

GACGTGGAAGAGGAGACTGAGCGCTGGG — GCAAGCCTCACGTGGCATCACTGTCCTTCCGCAGCCTCCTGG 
1250 1260 1270 1280 1290 1300 1310 


200 210 220 230 240 250 260 

TGGGCTGTGAGAAGGTGGGAGCCGTGCAGAACTCCTGT — GAT AACTGTCAGCCTGGT ACTTTCTGCAGAAA 


i i i I 


AGCT CCGC AGG AC ACTGG — CCC ATGGAGCTGTGCT CTTGG ACCT CGATC AG-C AG ACCCTGCCTG — GGGT 
1320 1330 1340 1350 1360 1370 1380 


270 280 290 300 310 320 

AT ACAATCCAGTCTGCAAG— AGCTGCCCTCCAAGT ACCTTCTC— CAGCAT AGGTGGAC AGCCGAACTG— T A- 

i i i i li i I I i i I i • ill ii i i 1 i i « i i i i i i i i i i i 

i i i i it i i i i i i i i ill ii i i i i i i i i i i i i i i i i • 

GGCCCATCAGGTGGTCGAGCAGATGGTTATCTCTGACCAGATCAAAGCAGAGG— ACAGAGCCAATGTGCTAC 
1390 1400 1410 1420 1430 1440 1450 


330 340 350 360 370 380 390 

-A CATCTGCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCCACAAC— GCG 

t i i i i i t i i ii ii ill i i i I i i i i i i i • till iii 

i i i i i i i i i ti ii tit i i i i i i i t i i i i iiii iii 

GAGCCCTTCTGCTGAAACACAGCCACCCAAGTGATGAGAAAGAA— TTCTCCTTCCCCCGGAACATCTCAGCG 



A H-CJV 




uiu 




400 

GAGTGTGAGTGC- 


Sfff Available^ 43o 44Q ^ 

-ATTG A AGG AT-TCCATTGCTTGGGGCC — ACAGTGCACCAGATGTGAAAAGGACT 


GGCTCTCTGGGCTCTCTCCTGGGGCATCACCACGCCCAGGGGACTGAGAGTGATCCTCACGTCACTGAGCCT 
1530 1540 1550 1560 1570 1580 1590 


460 470 480 490 500 510 520 

— GCAGGCCTGG CCAG— GAGCT AAC— GAAGCAGGGTTGCAAAACCTGT AGCTTGGGA ACATTT A ATGAC 

I i til i l i i i 1 i i i i i i i i i i i i i i i i i i i ii 

i • til i i i i i i i i i i i i i i > i i i i i i t i i i t i 

CTCATCGGTGGTGTTCCTGAGACCCGGCTGGAGGTGGATAG-AGAGCGTG-AGCTGCCGCCCCCAGCCCCAC 
1600 1610 1620 1630 1640 1650 1660 

530 540 550 560 570 580 590 

CAGAACGGT ACTGGCGTCT— GTCGACCCTGGACGAACTG— CTCT— CT AGACGGAAGGTCTGTGCTT AAGACC 

• • * i I i i i i <1 til i i i i i i i i i i i i i i i i i i i ii ii i 

* i ' ' » ' ' » i ii ill i i i < i i i i i i i i i i i i I i i ii ii t 

CTGC A— GGT ATT ACCCGCTCCAAGTCCAAGCATGAGCTGAAGCTGCTGGA — GAAGATCCCTG— AGAATGCA 
1670 1680 1690 1700 1710 1720 1730 


600 610 620 630 640 650 

GGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCA TTT 

I I I I I I I II I I III I I I I i I III l l l l l i i | | I I I I 1 I III 

I • I l » • i II I l III l I l l i I III l tltillt i i i I i i i III 

GAGGCCACAG TGGTCCTCGTGGG CTGTGTGGAGTTCCTCTCCC-GCCCCACCATGGCCTTT 

1740 1750 1760 1770 1780 1790 


660 670 680 690 700 7 1 0 720 

CTGTGACT-CCAGAGG — GAGGACCAGGAGGGCACTCCTTGCAGGTCCTTACCTTGTTCCTG— GCGCTGACA 

ii i ii i till i ill i ill ill i i ii iiii i ill lilt i i i i i i 

ii i <■ ■ iiii i i«i i ill ill i i ii iiii i tit iiii • i i i i i 

GTGCGCCTGCGGGAGGCTGTGGAACTGGA— TGCAGTAC— TGGAGGT GCCTGTGCCTGTGCGCT-TCC 

1800 1810 1820 1830 1840 1850 

730 740 750 760 770 780 790 

TCGGCTTTGCTGCT — GGCCCTGATCTTCATTACTCTCCTGTTCT— CTGTGCTCAAATGGATCAGGAAAAAA 

>1 l l i i i i i i l i l l l i ii i ill ill ii till i ii i 

• i > i i i i i i i i t i i i i ii i lit lit ii iiii i ii i 

TCTTC-CTGCTGCTGGGGCCCAG CAGCGCCAACATGGACT ACCATG AGATTGGCC — GATCCAT 

1860 1870 1880 1890 1900 1910 


800 810 820 830 840 850 860 

TTCC— CCCACAT AT TCAAGCAA — CCATTTAAGAAGACCACTGGAGCAGCTCAAGAGGAAGA — TGCTT 

• Ii i t i III • i i i i l l i III I l ii i it 1 iiii i |i ill ii iiii 

• ii ill lit i i • i i • i • lit i i ii i ii i iiii i it ill ii iiii 

CTCCACCCTCATGTCTGACAAGCAATTCCA— CGAGGCAGCCTACCTG-GCAGATGAACGGGATGACTTGCTG 
1 920 1 930 1 940 1 950 1 960 1 970 1 980 


870 

GT AGCTGCCGATGTC- 


880 

CACAGGA-AGAAG- 


890 900 

-AAGG AGGAGGAGGA- 


910 

■GGCT ATGAGCTGTGA 


ACTGCTATCAATGCCTTCCTGGACTGCAGTGTTGTGCTACCGCCTTCTGAAGTGCAGGGCGAGGAGCTGCTG 


1990 


2000 


2010 


2020 


2030 


2040 


2050 


2060 


920 X 
TGTACTATC 


CGTTCTGTTGCCCATTTCC 

2070 


10. ELLIS-267-3 A 

ECASNA E. col i asn-A gene -for asparagine-synthetase. 

ID ECASNA Standard; DNA; PRO; 2170 BP. 

XX 

AC V00263 ; 

XX 

DT 07— APR— 1 983 ( m i nor nod i -f i cat i ons ) 

DT 09— JAN— 1 982 <+irst entry) 



DE E. coli asn-A ge^^ggjg^r^ine-syinthetase. 

AA 

KW synthetase. 

XX 

□S Escherichia coli 

□C Prokaryota; Bacteria; Gram-negative facultatively anaerobic rods; 

□C Enterobacter iaceae. 

XX 

RN [1] (bases 1-2170) 

RA Nakamura M. . Yamada M. » Hi rota Y. » Sugimoto K. . Oka A. i 
RA Takanami M. ; 

RT "Nucleotide sequence of the asnA gene coding for asparagine 
RT synthetase of E. coli K-12"; 

RL Nucleic Acids Res. 954669-4676(1981). 

XX 

FH Key From To Description 

FH 

FT CDS 1034 2023 reading frame asn-A 

XX 


SQ Sequence 2 1 70 

BP; 

497 a; 524 c; 

606 

GJ 543 t; 

0 other; 


Initial 

Score = 

67 

Opt i m i zed Score 

= 418 

Significance = 

4. 12 

Residue 

Identity = 

51% 

Matches 


= 508 

Mismatches = 

358 

Gaps 

= 

130 

Conservative 

Substitutions 

= 

0 


X 

10 

20 


30 

40 50 



ATGT CC ATG AACTGCTGAGT — GGAT AAACA-GCACGGGAT ATCTC-TGTCT AAAGG — AAT 

J | | J j * | Ijiiil ill i t l lit ititi II i 

TTCTTTTTT AATG — AATCAAAAG— TGAGTT AGGCTTTTT ATTGAATGATT ATTGCATGTGTGTCGGTTTTT 
930 X 940 950 960 970 980 990 


60 70 80 90 lOO 110 120 

ATT ACT ACACCAGGAAAAGGACAC ATT CGAC A ACAGG A A AGG AGCCTGT C AC AG A A A ACC AC AGTG 


i i 
• i 


GTTGCTT AATCAT AAGCAACAGGACGCAGGAGT AT AAAAAATGAAAACCG-CTT ACATTGCCAAACAACGTC 
1000 1010 1020 1030 1040 1050 1060 


130 140 150 160 

TCCTGTGCAT— GTGACATTTCGCCATGGGAAACAACTGTTACAACG — 


170 180 190 

•TGGTGGTCATTGTGCTGCTGCT AG 


AAATTAGCTTCGTGAAATCTCACTTTTCTCGTCAACTGGAAGAACGTCTGGGGCTGATCGAAGTCCAG- 
1070 1080 1090 1100 1110 1120 1130 


•GCG 


200 210 220 230 240 250 

TGGGCTGTGAGAAG— GTGGGAGCCGTGCAGAACTCCTGTGATAAC— TGTCAGCCTG-GTACTTTC — TGCAG 

> 1 * l * ■ i i l l l i i it ii i i i l i i i till i it» • i i > i i i i 

• • i ii • i i i i i i i ii ii i i i t i i i i i i i i ill i i i i i • t t 

CCGATTCTT AGCCGTGTGGGGGATG— GC ACGC — AGGAT AACTTGT CGGGCTGTGAAAAAGCGGTGC AG 

1140 1150 1160 1170 1180 1190 1200 

260 270 280 290 300 3 1 0 320 

— AAAAT ACAATCCAGTCTGCAAGAGCTGCCC TCCAA GT ACCTTCTCCAGCAT AGGTGGAC A— GCC 

• I • * > • < i i i i i ii i i i i i ii ti Ii i ill i ii i i i i i i i i i 

• < < > • i < i i i i i t» i i i i i it ii it i lit i ii i i i t i i i i i 

GT AA AAGTGAAAGC — TCTGCCTGA — TGCCCAGTTCGAAGTGGTTCATTCACTGGC— GAAGTGGAAACGTC 
1210 1220 1230 1240 1250 1260 

330 340 350 360 370 380 

— GAACT GTAACA — TCTGCAGAGTGTGTGCAGGCTATTTCAGGTTCAAGAAGTTTTGCTCCTCTACCC 

ill* l i i i i i i i I • i l 1 i i i l • i i I i • i i i i l i i i i ii 

I*'* • i i i i i i i i i i • i i i i i till i t i i i i i i i i i ii 

AGACCTT AGGGCAACACGACTTCAGCGCGGGCGAAGGGCTGT ACACGCACATGAA AGC-CCT-TCGCC 

1270 1280 1290 1300 1310 1320 1330 

390 400 4 1 0 420 430 440 450 

ACAACGCGGA — GTGTGAGTGCATTGAA GGATTCCATTGCTTGGGGCCACAGTGCACCAG — ATGTGAA 

**l 1 I ill I I I i 1 I II I III it I I I I I I I i i i | 

III ii ill I l I I I I ii I ill ii I • I I I i I till 

CCGATGAAGACCGTCTTTCTCCGTTGCACTCGGTCTATGTTGACCAGTGGGACTGGGAACGCGTAATGGGCG 



Best Available Copy 

460 470 480 ™ 490 500 510 520 

AAGGACTGCAGGCCTGGCCAGGAGCTAACGAAGCAGGGTTG-CAAAACCTGTAGCTTGGGAACATTTAATGA 

'll ii i i > l ill i ■ i i i i i i i lilt ii till ttiii i 

I*' till i i til i i i i i i i i i iiii it iiii i i i i t i 

ACGGTGAGC-GTCAATTCTCGACTCTGA-AAAGCACGGTAGAGGCGATCTG-GGC — GGGA — ATT AAAGCA 
1410 1420 1430 1440 1450 1460 1470 


530 540 550 560 570 580 590 

CCAGAACGGTACTGGCGTCTGTCGACCCTGGAC — GAACTGCTCTCT AGACGGAAGGTCTGTGCTT AAGACC 


ii li 


ACCGAA— GCTGC— GGTTAGCGAAGAGTTTGGCCTGGCACCGTTC-CT— GCCGGA TCAGATC CACT 

1480 1490 1500 1510 1520 1530 


600 610 620 630 640 650 660 

GGGACCACGGAGAAGGACGTGGTGTGTGGACCCCCTGTGGTGAGCTTCTCTCCCAGTACCACCATTTCTGTG 

• till till I ill il li l • l l i l i i • i ii ti 

' iiii iiii i ill ii ii i i i i i iiii i ii ii 

TCGT ACACAG— CCAGGAGTT ACTGT CTCGTT ATCCGGATCTT— GATGCCA — AAGGGCGTGAGCG— G 

1540 1550 1560 1570 1580 1590 


670 680 690 700 710 720 730 

ACTCCAGAGGGAGGACCAGGAG— GGCACTCCTTGCAGGTCCTTACCTTGTTCCTG— GCGCT— GACATCGGCT 

1 III II I I I l I I I l I I I I I il li i i i i i l i i • i i i i i i 

• ill ii i i i i i i i i • r i i « ii ii i iiiiiitiiiitii 

GCGATAGCGAAAGATCTTGGCGCGGTATTCCTTGTCGG— GATTGGCGGCAAGCTGAGCGATGGTCATCGCCA 
1600 1610 1620 1630 1640 1650 1660 


740 750 760 770 780 790 800 

TTGC— TGCTGGC-CCTGATCTTCATTACTCTCCTGTTCTCTGTGCTCAAATGGATCAGGAAAAAATTCCCCC 

» • • » l i i i l i i Iiii il l i i t i i i ill ill i i 

* »»i i i i i i i i iiii li i i i i i i i ill ill i i 

CGACGTGCGCGCACCGGATT ATGATGA — CTGGAGCACCCCGT— CAGAGCTGGGCCATGCGGGTCTGAACGG 
1670 1680 1690 1700 1710 1720 1730 

810 820 830 840 850 860 

ACATATTC AAGCAACCATTT A — AGAAGA— CCACTGGAGC — AGCTCAAGAGGAAGATGCTTGT AGCT 

• i • • i i i iiii il i i i i i i i iiii ill < ii lit i i i i' i i i 

i i • • i • • iiii li iiiii i i iiii ill i il ill i i | | i | i 

CGATATTCTGGTGTGGAACCCGGTACTGGAAGATGCGTTTGAGCTTTCCTCCATGGG — GATCCGTGT AGAT 
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STRACT: 


e present invention relates to the treatment of wound healing 
sf unction by the administration of one or more would healing 
dulators. The wound healing modulator may be selected from appropriate 
und healing agents and binding partners, and particularly agents that 
hance wound healing. The agent may comprise a cytokine, or mixture of 
tokines that are also capable of binding to heparin, and inducing 
calized inflammation characterized by polymorphonuclear cell 
filtration when administered subcutaneously. Particular agents comprise 
e inflammatory cytokines MIP-1, MIP-1. alpha. , MIP-1. beta, and MIP-2. 
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